DW Landscape

The Data Warehouse Landscape - Q4 2013

The Information Difference Landscape is a high level assessment of the main and most innovative vendors in a market at a point in time. The diagram shows three dimensions. The size of the bubble is an indication of the customer base of the vendor i.e. the number of corporations it has sold to, adjusted for deal size. The larger the bubble, the broader the customer base, though it is by no means to scale. The technology dimension position is derived from a weighted set of scores based on four factors: customer satisfaction as measured by a survey of reference customers, analyst impression of the technology, maturity of the technology and breadth of technology in terms of its coverage against our functionality model. The market strength position is derived from a weighted set of scores based on five factors: data warehouse revenues, growth, financial strength, breadth of partner network and geographic coverage.

Stacks Image 192
The data warehouse faces new challenges in 2014, with the sheer rise in data volumes presenting issues for established vendors and opportunities for new ones. In 2003 the largest data warehouse in the world was 30 TB in size, yet there are many examples now of petabyte sized operational data warehouses, a more than 30 fold increase in just a decade. A 2012 Information Difference survey of 209 customers showed that most were experiencing data growth of 20-50% annually. This and the desire of companies to analyse newer forms of data such as web traffic, sensor data and other machine-generated data of various forms has meant that a lot of attention is being given to so called “Big Data” and newer approaches to tackling it, such as Hadoop. Hadoop and similar technologies are well suited to handling this “semi-structured” data, and are able to do so very economically compared to traditional database technologies.

However it is clear that Big Data is in its infancy. A December 2013 Information Difference survey with 178 responses found that 27% of organizations had live “Big Data” implementations, but with many of these respondents struggling to put together a business case for it. What was particularly intriguing was that the volumes of data being reported as “Big Data” were not especially large, with just 5% of respondents having over a petabyte of data to deal with, and only a further 4% with over 200TB to cope with. Indeed the most common volume of “big” data in the survey was around the 10TB range, well within the scope of current database technologies.

It seems likely to us that Hadoop will evolve as a largely complementary offering to the traditional data warehouse, with structured data still handled by traditional databases, but machine generated data and web data increasingly stored using Hadoop and related technologies. Tools vendors are scrambling to adapt their technologies to offer at least some form of Hadoop support, so we expect that Hadoop will appear alongside, rather than as a replacement for, data warehouses in most corporate architectures.

Within the data warehouse world, the largest vendors remain Oracle, IBM, Microsoft and Teradata, with Greenplum now part of EMC and SAS Institute being other large-scale providers. Increasingly, but not exclusively, columnar approaches are used for large-scale data warehouses. This approach, which allows much greater data compression at the expense of lengthier load times and more problematic concurrent update performance, is very well suited to the (mostly) read-only environment of data warehousing. Pioneered by Sybase, columnar databases such as those of Calpont are now well established as a good approach to high volume data warehouses, often combined with massively parallel processing approaches to take advantage of commodity server hardware.

Companies are increasingly contemplating moving their data warehouses into the cloud. A pioneer in this area was Kognitio, and other vendors now offer this possibility, although on-premise remains the dominant data warehouse paradigm. The steadily declining price of flash memory means that larger and larger data warehouse applications can be dealt with in memory rather than on slower (but cheaper) disk. The major vendors of hardware appliances are taking advantage of this, a trend offset by the steady increase in overall data volumes.

In addition to companies providing database technologies and associated data warehouse technologies, there are a couple providing data warehouse offerings that use 3rd party databases, one being WhereScape. The other, Kalido, was acquired in late 2013 by a Texas company called Silverback. Given the lack of clarity around its future direction I have dropped it from the chart until it becomes clear what its new owner intends for the technology.

Overall, the data warehouse market has never been more exciting from a technology point of view. A wide range of technologies is now being applied to satisfy our insatiable appetite for more data, and our need to make sense of that growing volume of data.

With such different sub-markets it is important that end-users carefully consider the alternatives appropriate to them to match their particular need; high-level overviews of the market, such as this Landscape, cannot capture specific customer requirements, and any technology selection process should be discussed in detail with an analyst.

As part of the research process vendors were asked to provide customer references, which were sent a survey on their satisfaction with the vendor’s products (if the vendor failed to provide sufficient references, a neutral score was assigned). Based on this survey, the data warehouse vendor with the happiest customers in 2013 was Kognitio, followed very closely by Teradata, then Calpont, SAS and InfoBright.

Below is a set of vendors who provide data warehouse technology, many of which are in addition to those covered in our core research.

VendorBrief DescriptionWebsite
ActianActian's Vectorwise product is an analytic database on commodity
Algebraix DataAnalytic database running on SMP
CalpontProvides a column-oriented, MPP analytic database called
ClouderaProvides a distribution of the Hadoop data management
ExasolGerman data warehouse appliance
GreenplumAppliance vendor aiming at high-end warehouses, now part of
IBMInfosphere Balanced Warehouse (formerly DB2) is the data warehouse software offering from the industry giant, which also offers two appliances: PureData for Operational Analytics (based on DB2) and PureData for Analytics powered by Netezza technology.
InfoBrightProvides a columnar-database analytics
KognitioMature data warehouse appliance, and offers its data warehouse as a
KalidoNot an appliance, but rather an application to generate data warehouses that adapt to change, running on various database platforms.
MicrosoftAs well as its SQL Server relational database, Microsoft acquired Data Allegro and at the end of 2010 launched its Parallel Warehouse based on this
MonetDBMonetDB is an open-source database system for high-performance
Neo4jOpen source graph
OracleAs well as its well-established database, Oracle offers the Exadata warehouse
ParAccelProvides a column-oriented database
SandFocuses on allowing customers to-effectively retain massive amounts of compressed data in a near-line repository for extended
SAP/SybaseSybase was a pioneer in column-oriented analytic database technology, acquired in mid-2010 by giant SAP. SAP is also now offering the in-memory database technology HANA.
SAS InstituteComprehensive data warehouse technology from the largest privately owned software company in the
1010 DataProvides column-oriented database and web-based data analysis
TeradataArguably the original pioneer of the data warehouse
VerticaAppliance vendor Vertica was purchased by HP in
WhereScapeNot an appliance, but a framework for the development and support of data