The Data Quality Landscape - Q1 2016
The data quality market for the calendar year 2015 was worth around $1.1 billion, of which software sales and maintenance accounted for around $900 million. The overall figure includes the professional services arms of data quality vendors, but excludes the (substantial) revenues of systems integrators and consultancies involved with data quality initiatives.
Data quality is a familiar problem, an issue that affects every company to one degree or another. The software industry segment that tackles data quality grew up to handle the specific problem of improving mailing lists, which was a conveniently large market and one that was relatively tractable to software solutions. There are many well-understood algorithms that can detect likely name and address errors and assist in matching potentially duplicate entries. Database records referring to “Mr A. D. Hayler”, “Andy Hayler” and “A. Hayler” may well be referring to the same person, especially if that person happens to be at the same mailing address. Validation of postal codes is a related issue, though this is country-specific since some countries lack postal codes entirely or have only partial coverage. However data quality software these days can do much more than basic cleansing and matching. A given address can be geo-coded, and from that information quite a rich set of data can be inferred about that location. For example it is possible to discern whether a specific address is in a certain voting district, on a flood plain (handy if you are an insurer) or simply near a particular bank or shop (useful for consumers). If the address is a business, then it is possible to display how many people work at the site or even its credit rating using the many business databases that exist these days.
Over the years algorithms for duplicate detection and matching have become more sophisticated. Some software applies well-tried algorithms that detect frequent spelling mistakes, while others have built up business glossaries that allow matching based on experience and the context of the records. Data quality software usually allows businesses to “tune” the likelihood of matching records, automatically merging those with a particular threshold of match certainty, and referring borderline cases for human review. In some circumstances you will want to flag up possible matches more than others: detecting a potential terrorist boarding an airliner has different consequences from a duplicate marketing letter being sent out, for example.
The industry offers a considerable range of solutions, from basic postal code validation to full-service data quality suite, possibly linked to master data management or data governance offerings. Not surprisingly there is a considerable price difference between the elaborate suites of software from Informatica, IBM, SAP and SAS compared to more specialist tools that focus on a specific problem or market. Some tools tackle the problem of data quality in general rather than just for customer data. For example product data frequently requires parsing from free-text files and tools may provide the ability to match data to standards such as UNSPSC or GS1. Compliance is an increasingly important driver for data quality in industries such as banking, pharmaceuticals and insurance, where the consequences of poor data quality can now be very costly indeed.
Many vendors provide some form of cloud offering these days, though in most cases the traditional on-premise model still accounts for the vast majority of licence revenue. Some vendors have developed links to Hadoop files in order to position themselves for “Big Data” quality, though discussions reveal that the deployment of data quality in such environments is still at a very early stage. Some vendors have now tested their products with the main commercial implementations of Hadoop, such as Cloudera, HortonWorks and MapR. There is certainly a need for data quality in this environment if companies are to avoid their “data lakes” becoming data swamps.
The diagram that follows shows the major data quality vendors, displayed in three dimensions. See later for definitions of these.
It is important to understand that this is a high-level representation of the market, with vendors represented on the chart specialising in different areas and at very different price-points. If you are considering data quality software, it is important to tailor your selection process to the particular needs that you have rather than relying on high-level diagrams such as this. The Information Difference has various detailed models that can assist you in vendor selection and evaluation.
As part of the landscape process, each vendor was asked to provide at least ten reference customers (some vendors provided over 50 references), which were surveyed to determine their satisfaction with the data quality software of the vendor. The happiest customers based on this survey were those of helpIT, Trillium and Innovative Systems, followed by Datactics, Experian and Infoshare. Congratulations to those vendors.
Below is a list of the main data quality vendors.
|ActivePrime||California-based vendor of data quality for CRM systems.||www.activeprime.com|
|Address Doctor||Vendor that specialises in providing wide coverage of name and address information; now owned by Informatica.||www.informatica.com/addressdoctor.html - fbid=-gz2yeRJkyH|
|Ataccama||Prague-based company with a modern data quality suite.||www.ataccama.com|
|Business Data Quality||UK-based data profiling vendor.||www.businessdataquality.com|
|Capscan||London-based provider of address management and data integrity services, now owned by GB Group.||www.gbgplc.com/uk/|
|Data Mentors||Long-establsihed US data quality vendor.||www.datamentors.com|
|Datactics||UK-based vendor of data quality and matching software to banking, finance, government, healthcare and industry.||www.datactics.com|
|DataQualityFirst||US start-up whose application lives on top of IBM Quality Stage.||www.dataqualityfirst.com|
|Datiris||Colorado vendor of data profiling technology.||www.datiris.com|
|Datras||Munich-based vendor with wide ranging data quality functionality.||www.datras.de|
|DQ Global||UK data quality and address verification software.||www.dqglobal.com|
|Experian||UK-based vendor specialising in customer name and address validation, recently expanded in scope via the integration of X88's technology.||www.edq.com|
|The search engine giant does data quality.||github.com/OpenRefine|
|helpIT||US/UK vendor of batch and real-time data quality solutions including address validation.||www.helpit.com|
|Human Inference||Dutch data quality vendor.||www.humaninference.com|
|IBM||Data quality software from the industry giant.||www.ibm.com|
|Infogix||Illinois-based vendor specialising in controls and compliance.||www.infogix.com|
|Infoglide||US vendor specialising in identity resolution.||www.infoglide.com|
|Informatica||California-based vendor, a major player in data quality.||www.informatica.com|
|Infoshare||UK data quality specialising in the public sector market.||www.infoshare-is.com|
|Innovative Systems||Long established data management vendor that has leveraged their knowledge-based data quality solutions into Data Discovery, MDM, and Data Governance offerings.||www.innovativesystems.com|
|Inquera||Israeli company with an approach to product data quality using machine-learning technology based on subject domain experts’ knowledge.||www.inquera.com|
|Intelligent Search||Identity management company now with a more general data quality capability.||www.intelligentsearch.com|
|Irion||Italian data quality vendor specialising in financial services.||www.irion.it/index.php/en/|
|Melissa Data||US/German global data quality vendor offering address verification, geocoding and matching solutions.||www.melissadata.com|
|Microsoft||DQS is the data quality offering of the Redmond software behemoth.||www.microsoft.com|
|Netrics||New Jersey vendor of matching software. Now owned by Tibco.||www.tibco.com/products/automation/application-integration/pattern-matching|
|Oracle||The software giant's data quality offerings are based on the acquisitions of Datanomic and SilverCreek.||www.oracle.com|
|Pitney Bowes||Pitney Bowes, a global technology company, provides data quality solutions through its Customer Information Management (CIM) unit, which is part of its Software Solutions division.||www.pitneybowes.com/us/customer-information-management/data-quality.html|
|Postcode Anywhere||UK vendor of web-based addressing software.||www.postcodeanywhere.co.uk|
|SAP||The software giant is a major data quality player.||www.sap.com|
|SAS||One of the leading players in data quality.||www.sas.com/en_us/software/data-management/data-quality.html|
|Satori Software||Seattle-based provider of address management solutions.||www.satorisoftware.com|
|Talend||Open source vendor with wide range of quality functions that are tied to data integration and MDM.||www.talend.com|
|TAMR||Vendor that applies machine learning to the data quality problem.||www.tamr.com|
|Trillium||A Harte Hanks company and one of the leading data quality vendors.||www.trilliumsoftware.com|
|Uniserv||Large German data quality vendor.||www.uniserv.com|
Other vendors of data quality software include:
The Information Difference Landscape diagram shows three dimensions of a vendor:
▪ Market strength
▪ Customer base.
“Market strength” is made up of a weighted set of five factors: revenues, growth, financial strength, geographic scope and partner network. Each of these individual elements is scored, the total producing the “market strength” figure. Similarly “technology” is made up of four factors: “technology breadth” (the coverage of the vendors in various data quality areas as illustrated below), the longevity of the software in the market, analyst perception of the product via briefings, and customer feedback from reference customers (this has a high weighting), which we surveyed. In each case the scoring is on a scale of 0 (worst) to 6 (best).
Vendors were asked to submit answers to various questions via a questionnaire. Vendors were interviewed directly by an analyst and their software demonstrated and assessed. Reference customers were surveyed to give their experience of the software of each vendor. The technology functions which the vendors were asked about are as shown below. These are drawn from the Information Difference vendor functionality model; if you are interested in more detail on this then please contact The Information Difference.