Saturday, June 25, 2011

The real (potential) impact of SAP HANA

Much has been written about SAP HANA. The technology has been variously described as "transformative" and "wacko." Well, which is it?

I have a few disclosures to make before I continue my analysis and comments on Hana:
  1. I worked at SAP for six years, as well as eight years at Oracle (plus also at Ingres before that).
  2. I was at SAP when the technology underlying HANA was acquired, though I am referring to and using no trade secrets or proprietary information in preparing this analysis.
  3. I attended this year's SAPPHIRE conference in Orlando, and SAP paid for my airfare and hotel.
Relational Databases
Relational databases have dominated the commercial information processing world for twenty years or more. There are many good reasons for this success.
  1. Relational databases are suitable for a broad range of applications.
  2. Relational databases can enable access to data relatively efficiently even if the query was not initially envisioned when the database was designed.
  3. Today's relational databases are economical, available on a broad range of hardware and operating systems, generally compatible across vendors, performant for many queries, scalable to fairly large data volumes without resorting to partitioning, suitable for partitioning when larger scale is required, based on open standards, mature, and stable.
  4. There are a large number of developers, administrators, designers, and an ecosystem of service providers who are very knowledgeable about today's popular relational databases, and who are available at economic rates of pay.
NoSQL, Columnar, and In-Memeory Trend
There is an emerging trend towards databases that are designed to solve specific problems. While relational databases are good for solving many problems, it is easy to conceive of specific problems that are not well-solved by general-purpose databases. Relational databases are well-suited to handling structured data where the schema does not change, where text processing is not an important requirement, where data is measured in gigabytes rather than petabytes, where geographical or time-series (e.g., stream) processing is not required, and where the server does not need to support transactional and decision-support queries simultaneously.

Some problems do not fit those criteria. The data set is such that the schema varies from record to record, or over time. Text, image, "blob," or geographical data may be a dominant data type. More and more frequently, applications manage "big data," or huge volumes of data from millions of users or sensors. Some applications require simultaneous access to data for transactional updates as well as for aggregation in decision-support queries. For all of these cases, advanced architects and developers are looking at specialized data stores and data processing systems such as Hadoop, Cassandra, MongoDB, and others. These domain-specific data stores are known as "NoSQL" databases.

There is some controversy over whether NoSQL means "no SQL" or "Not Only SQL." Regardless, those non-relational stores such as Hadoop, are growing in popularity, but are not really a replacement for relational data stores. A key property of most commercial relational databases is their compliance with a principle called "ACID," which essentially guarantees that database transactions occur in a reliable way. Many NoSQL databases use techniques like "eventual consistency" to improve performance at the cost of inconsistent data - a sacrifice that is unsuitable for most business applications. After all, if you deposit money in a bank account, you want it to be available for withdrawal right away, not "eventually."

Another trend in the database world is towards new methods of storing data, without eliminating the ACID properties that business applications need, and without sacrificing the SQL language that is so well-known and widely supported. Two specific approaches are quite popular these days - columnar storage and in-memory databases.

Column stores, such as HP's Vertica or SAP Sybase IQ, store data by column. By contrast, traditional SQL databases store data as rows. The benefit of storing data as rows is that it is often the fastest way to look up a single value, such as salary, given a key value like the employee ID.

Columnar databases group data by column. Within a column, generally speaking, all the data is of the same type. A columnar store, therefore, stores data of a single type all together, which can give advantages such as the possibility for significant compression. Good compression can lead to reduced disk space requirements, memory requirements, and access times.

In-memory databases take advantage of two hardware trends: a significant reduction in the cost of RAM, and a significant increase in the amount of addressable memory in today's computers. It is possible, and economically feasible, to put an entire database in memory, for fast data management and query. Using columnar or other compression approaches, even larger data sets can be loaded entirely into main memory. With high-speed access to memory-resident data, more users can be supported on a single machine. Also, with an in-memory database, both transactional and decision-support queries can be supported on a single machine, meaning that there can be zero latency between data appearing in the system, and that data being available to decision-support applications; in a traditional set-up where data resides in the operational store, and then is extracted into a data warehouse for reporting and analysis, there is always a lag between data capture and its availability for data analysis.

Several years ago, SAP acquired Transactions In Memory, a company that had developed an in-memory database. Over the years since, at virtually each annual SAPPHIRE conference, SAP has discussed how this in-memory technology would revolutionize business computing, but I personally found the explanations to be somewhat short on convincing details.

Even the name, HANA, has changed in meaning over the years. Initially, the name stood for "Hasso's New Architecture" (and a beautiful vacation spot in Maui, Hawaii) and referred only to the software. Today, HANA stands for High-Performance Analytical Appliance, and refers to the software and the hardware appliance on which it is shipped. In addition, HANA has evolved from a data warehousing database into a more general purpose platform.

SAP HANA does manage data in memory, for nearly incredible performance in some applications, but it also manages to persist that data on disk, making it suitable for analytical applications and transactional applications - simultaneously. But HANA's capabilities do not end there, and that may be the key to HANA's long-term value.

In the short-term, it seems that SAP still struggles to generate references for HANA, other than in a narrow set of custom data-warehouse-type analytics. That may obscure where HANA can really deliver its first market successes.

When HANA is generally available, it is expected to include both SQL and MDX interfaces, meaning that it can be easily dropped into Business Objects environments to dramatically improve performance. Some Business Objects analyses, whether in the Business Objects client or in Excel, can achieve orders of magnitude of performance improvement, with very little effort. Imagine reports that used to take a minute to run now running instantaneously. Imagine the satisfaction of your BOBJ user community if all or most of their reports and analysis ran instantaneously. Line-of-business users will pay for this capability, and that will open the door for SAP HANA in Business Objects accounts. After HANA gets in the door, I'm sure the CIO will find tons of additional uses for it. This is huge, and will generate truckloads of money for SAP, while also making customers super-satisfied.

And think of what SAP HANA means for competitive comparisons with Oracle, SAP's maximum enemy. Larry wants to sell you Exalogic and Exadata machines, costing millions; Hasso wants to sell you a simple, low-end, commodity device delivering the same benefits. If I were SAP, I'd have sales reps with HANA software installed on their laptops, demonstrating it at every customer interaction, and comparing it (favorably) with Oracle Exadata, and suggesting that customers demand that Oracle sales reps bring in an Exadata box on their next sales call - and not to bother showing up without one. Larry wants to sell you a cloud in a box; SAP will sell you apps on the cloud, or analytics in a box for hundreds or a thousand times lower cost than Oracle's solution.

The longer term benefits of HANA will require new software to be written - software that takes advantage of objects managed in main memory, and with logic pushed down into the HANA layer. I'll post more on this potential in the future, but just think of what instantaneous processing of enormous data sets will mean to business - continuous supply chain optimization, real-time pricing, automated and excellent customer service, and much more.

In the long run, SAP HANA may indeed revolutionize enterprise business applications, but that remains to be seen. Right now, SAP HANA should be capable of creating substantial customer benefits - and generating a very large revenue stream to SAP.