The Data Warehouse Landscape - Q4 2013
The Information Difference Landscape is a high level assessment of the main and most innovative vendors in a market at a point in time. The diagram shows three dimensions. The size of the bubble is an indication of the customer base of the vendor i.e. the number of corporations it has sold to, adjusted for deal size. The larger the bubble, the broader the customer base, though it is by no means to scale. The technology dimension position is derived from a weighted set of scores based on four factors: customer satisfaction as measured by a survey of reference customers, analyst impression of the technology, maturity of the technology and breadth of technology in terms of its coverage against our functionality model. The market strength position is derived from a weighted set of scores based on five factors: data warehouse revenues, growth, financial strength, breadth of partner network and geographic coverage.
However it is clear that Big Data is in its infancy. A December 2013 Information Difference survey with 178 responses found that 27% of organizations had live “Big Data” implementations, but with many of these respondents struggling to put together a business case for it. What was particularly intriguing was that the volumes of data being reported as “Big Data” were not especially large, with just 5% of respondents having over a petabyte of data to deal with, and only a further 4% with over 200TB to cope with. Indeed the most common volume of “big” data in the survey was around the 10TB range, well within the scope of current database technologies.
It seems likely to us that Hadoop will evolve as a largely complementary offering to the traditional data warehouse, with structured data still handled by traditional databases, but machine generated data and web data increasingly stored using Hadoop and related technologies. Tools vendors are scrambling to adapt their technologies to offer at least some form of Hadoop support, so we expect that Hadoop will appear alongside, rather than as a replacement for, data warehouses in most corporate architectures.
Within the data warehouse world, the largest vendors remain Oracle, IBM, Microsoft and Teradata, with Greenplum now part of EMC and SAS Institute being other large-scale providers. Increasingly, but not exclusively, columnar approaches are used for large-scale data warehouses. This approach, which allows much greater data compression at the expense of lengthier load times and more problematic concurrent update performance, is very well suited to the (mostly) read-only environment of data warehousing. Pioneered by Sybase, columnar databases such as those of Calpont are now well established as a good approach to high volume data warehouses, often combined with massively parallel processing approaches to take advantage of commodity server hardware.
Companies are increasingly contemplating moving their data warehouses into the cloud. A pioneer in this area was Kognitio, and other vendors now offer this possibility, although on-premise remains the dominant data warehouse paradigm. The steadily declining price of flash memory means that larger and larger data warehouse applications can be dealt with in memory rather than on slower (but cheaper) disk. The major vendors of hardware appliances are taking advantage of this, a trend offset by the steady increase in overall data volumes.
In addition to companies providing database technologies and associated data warehouse technologies, there are a couple providing data warehouse offerings that use 3rd party databases, one being WhereScape. The other, Kalido, was acquired in late 2013 by a Texas company called Silverback. Given the lack of clarity around its future direction I have dropped it from the chart until it becomes clear what its new owner intends for the technology.
Overall, the data warehouse market has never been more exciting from a technology point of view. A wide range of technologies is now being applied to satisfy our insatiable appetite for more data, and our need to make sense of that growing volume of data.
With such different sub-markets it is important that end-users carefully consider the alternatives appropriate to them to match their particular need; high-level overviews of the market, such as this Landscape, cannot capture specific customer requirements, and any technology selection process should be discussed in detail with an analyst.
As part of the research process vendors were asked to provide customer references, which were sent a survey on their satisfaction with the vendor’s products (if the vendor failed to provide sufficient references, a neutral score was assigned). Based on this survey, the data warehouse vendor with the happiest customers in 2013 was Kognitio, followed very closely by Teradata, then Calpont, SAS and InfoBright.
Below is a set of vendors who provide data warehouse technology, many of which are in addition to those covered in our core research.
|Actian||Actian's Vectorwise product is an analytic database on commodity hardware.||www.actian.com|
|Algebraix Data||Analytic database running on SMP boxes.||www.algebraixdata.com|
|Calpont||Provides a column-oriented, MPP analytic database called InfiniDB||www.calpont.com|
|Cloudera||Provides a distribution of the Hadoop data management platform.||www.cloudera.com|
|Exasol||German data warehouse appliance vendor.||www.exasol.com|
|Greenplum||Appliance vendor aiming at high-end warehouses, now part of EMC.||www.greenplum.com|
|IBM||Infosphere Balanced Warehouse (formerly DB2) is the data warehouse software offering from the industry giant, which also offers two appliances: PureData for Operational Analytics (based on DB2) and PureData for Analytics powered by Netezza technology.||www.ibm.com|
|InfoBright||Provides a columnar-database analytics platform.||www.infobright.com|
|Kognitio||Mature data warehouse appliance, and offers its data warehouse as a service.||www.kognitio.com|
|Kalido||Not an appliance, but rather an application to generate data warehouses that adapt to change, running on various database platforms.||www.kalido.com|
|Microsoft||As well as its SQL Server relational database, Microsoft acquired Data Allegro and at the end of 2010 launched its Parallel Warehouse based on this technology.||www.microsoft.com|
|MonetDB||MonetDB is an open-source database system for high-performance applications.||monetdb.cwi.nl|
|Neo4j||Open source graph database||www.neo4j.org|
|Oracle||As well as its well-established database, Oracle offers the Exadata warehouse appliance.||www.oracle.com|
|ParAccel||Provides a column-oriented database appliance.||www.paraccel.com|
|Sand||Focuses on allowing customers to-effectively retain massive amounts of compressed data in a near-line repository for extended periods.||www.sand.com|
|SAP/Sybase||Sybase was a pioneer in column-oriented analytic database technology, acquired in mid-2010 by giant SAP. SAP is also now offering the in-memory database technology HANA.||www.sap.com|
|SAS Institute||Comprehensive data warehouse technology from the largest privately owned software company in the world.||www.sas.com|
|1010 Data||Provides column-oriented database and web-based data analysis platform.||www.1010data.com|
|Teradata||Arguably the original pioneer of the data warehouse appliance.||www.teradata.com|
|Vertica||Appliance vendor Vertica was purchased by HP in 2011||www.vertica.com|
|WhereScape||Not an appliance, but a framework for the development and support of data warehouses.||www.wherescape.com|