Friday, August 16, 2013

Mastering the Right Data Management Solution

Choose- What's right for you... Not what you are lured with!!

We are often confused by our needs. Sometimes we do not have options and have to trust what we get as the best and some other times we are showered with so many offers that we are not sure whether what we are choosing is actually what we want. Brand names, attractive features, extravagant style often blinds our judgment and we end up with a rather overpriced superficially complex, and high-maintenance seeking product.

Typical MDM Project - Business Value realization

The goal of any MDM implementation is to realize business value. The following diagram shows how a typical MDM solution is implemented.

There are multiple challenges that need to be overcome in order to realize "Mastering the Right Data Management Solution".

First Challenge – Choosing the Right MDM Solution

So how do we choose?

A very innocent question can result in endless nights of a brainstorming session. There is no 1-Fit-All Tool. First of all, we should understand that each enterprise has a unique way of operating. Although we can broadly categorize different types of industries they all in their own way are unique. And that's the beauty as well as the sole reason for their existence. This means that they cannot have the same exact problems to address. 

On one hand, there are organizations who want to streamline the business operation to reduce fraudulent activities and hence keeps on introducing various norms and guidelines which an enterprise has to adhere to without any choice on the other hand there are some critical business-related and revenue-generating operations which will need specific governance and management rules. 

Hence what you need is not a tool but a solution to your problem and an MDM Solution comprises of three main part:
1. The MDM Tool
2. The MDM Architecture
3. The expertise of the Consultants on board.

We are all about Data. So the first and foremost thing we need is an IT-Savvy Business leader who knows his data and is prepared to take ownership.
From an MDM practice point of view, an MDM consultant can guide you with the right architecture or the right practice but you being the business owner should be the ultimate person to make a decision whether or not that is what you want and hence we should follow the following points.

1. Should know your data and its value
2. Always have the big picture in Mind
3. Think Big but start small. Always remember that Rome wasn't built in a day.
4. Before you are blinded by the flashy terms like Multi-Domain, Big-Data, Dynamic-Hierarchy. You must think that do you even need them? Or are they going to generate any extra bucks for you? Or don't you already have an in-house solution that is already delivering the same?
5. How can you make the best use of the already existing resources and still get the most robust and trusty solution

Second Challenge – Choosing the Right MDM Architecture

It is of utmost critical to understand what type of data problem do you have if at all you do have one. It is important because when we speak Mastering the data and creating a "Golden" copy of Information or "The best version of Truth". However, you may put it but the fact is it's not a single team job. It's an amalgamation of multiple task forces. Hence Enterprise architectural integration is also very critical. But let's stick to the issue at hand. Which is MDM. So what type of MDM do we need. In order to know that we first need to understand what are the different implementation patterns available. 

  • Consolidation 
  • Registry
  • Centralization
  • Co-existence

When we are aware that there are silos of the same or similar data in the enterprise and we need to harmonize them to create a golden copy in-fact the aim is to identify the best version among all store information. And in order to do that we need to bring them all to a central location. Then de-dupe it and execute survivorship to generate the Golden record. At times the defined rules in the system are not enough and we will need some expert intervention to resolve duplicities or even to realize whether or not they are duplicates at all like identifying false positives and dealing with them.
Typically large enterprises that have adopted digitalization long back and have collected information using multiple sources like ERP, CRM, Web Apps, Websites, POS, etc... end up with this sort of problem. 

So, if we need consolidation then the MDM solution should be capable of sophisticatedly handle complex matching scenarios and should give the developers the flexibility of choices to pick and select different matching algorithms and different matching techniques.

Sometimes when they realize that they have data silos but they continue to manage and maintain it in silos because the business demand so. However, it is critical for them to understand the presence of the same information across organizations and also to understand what kind of information is available across which system they go for registry implementation. In this case, the MDM solution is responsible to maintain only a map or linkages of similar/same information stored in various systems which can be used for cross-reference purposes.

Typically newer organization starts with this type of implementation. It is the best-case scenario where the MDM solution is in-fact the authoring tool. Hence it is very critical to have a strong governance or business process management capability to master the data in a correct fashion. The MDM solution should have data validation and workflow capabilities to address this implementation strategy and they are responsible to provide the mastered golden record to all the respective applications in the organization.

This is the reality in most of the MDM implementation projects. The aim always is to identify data silos and consolidate them. Once consolidated then use the MDM product also as the authoring tool and phase-wise decommissioning other authoring solutions. 

So, start with the Consolidation pattern and aim for the Centralization pattern. However lucrative this approach seems to be. It is often not possible as there are specialize COTS or custom full-stack apps that address a very specific and business-critical requirement hence cannot be decommissioned hence co-existence is the only option. Where we not only consolidate information coming from other systems but also enrich them using MDM and then publish it to everyone else.

Third Challenge – Building a Business case for MDM

✓Reconciliation – Reducing the direct costs of bad data
✓Reporting – Ensuring accurate reports for high-quality decision making
✓Regulatory compliance – Maintaining the organization’s good standing with the authorities
✓Risk management - Avoiding business risks


Reconciliation is the process of diagnosing and then repairing mismatches between an organization’s accounts/groups/datasets/systems/transactions.
Without an authoritative source of master and reference data, where do your users go to find the right codes? cost centers? product numbers? Or verify the definition of specific reference data elements? Do you trust the sources your enterprise systems are consuming from? 
By not clearly defining a master source, systems and users run the risk of using incorrect or out of date information. All these creating reconciliation problems that need to be fixed later.

People costs
How much time do our employees spend on recon? What impact does this have on employee morale?
# Employees x Average Cost/Employee/Hour x Average Hours Spent† on Issues annually

Costs due to lack of centralization and ambiguity
What’s the cost of independently maintaining master data? Is there value in simplification? What’s the value of a single authoritative source?
# of Systems x Average Time (hours)‡ Required to Update System x Average Cost/Hour to Update

†Employee time should include time to report issues; diagnose, research, and repair errors; and updates to avoid future issues.

‡Other costs to include are the time required to make changes to the synchronization code (everywhere it exists) when there are system updates


Enterprise hierarchies, dimensions, and attributes are used to aggregate or subdivide the source system data presented in reports. When reporting across systems, the individual source system dimensions must be standardized (conformed).
Without MDM there is no enterprise source of reporting metadata (conformed dimensions and hierarchies). This means the reporting team prepares for reporting by manually conformation/mapping dimensions. Manual efforts add time to the reporting process and introduce errors and reduce flexibility.

Costs associated with report preparation
How long does it take to prepare the reporting metadata (master data, dimensions, and hierarchies/rollups)? Does anyone review (or approve) the prep? Is this work is duplicated throughout the organization?
# Reports x # Reports per Year x # Hours Spent on Prep† per Report x Weighted Average Cost per Hour

Other questions
How are adjustments (reporting quality issues) managed? Do the same issues recur? Is root cause analysis performed on each issue?
oAre ad-hoc reports easy to create? Are there requirements to report on-demand? Should there be (i.e. For crisis management reasons)?
Does the business want more control (e.g. self-service hierarchy management)? timely/easier to create reports?

†Time spent on prep should include all the time ironing out the inconsistency in the dimensions. This might include creating/merging new codes, comparing/updating hierarchies, and building mappings (or crosswalks) between the different taxonomies used in the organization. Mismatching taxonomies are often found in global companies where each region and country may choose to use different product hierarchies.

Regulatory Compliance

Many organizations need to comply with the region, country, or industry-specific data governance, data quality, and data management regulations. Examples include European privacy rules, US Healthcare (HIPAA, HITECH), Global Banking (BCBS 239, and USA’s DFA).
MDM supports compliance by acting as a place to coordinate policies (governance), their scope (system metadata), implementation (data management), and measure and report on data quality. In addition to traditional data governance policies, some groups may include their privacy policies and their intersection with data domains and security classifications.
Without an integrated foundation, gaps are created between each element of the data governance and data management framework.

Compliance costs
How will the regulator assess compliance? Are there benchmarks for compliance? How much flexibility do we have in terms of compliance?
What are the penalties for non-compliance or poor compliance? Are the penalties civil or criminal in nature? Can employees or executives be found personally culpable?
Does non-compliance create legal standing for customers/suppliers/partners to pursue legal action against us?

Costs due to a lack of integration with data governance
How are our data governance policies created/maintained? Are these policies integrated with data management systems (can we track them)? How do we policy exceptions? Measure KPIs?

How are issues created? managed through to a resolution? 

Risk Management

Master data are an important enterprise data asset because they represent the key elements of your business model: what, to whom, and how, and how to account for what you sell.
Without master data management low-quality data proliferates.
No MDM means few formal controls on the creation, update, and retirement of master data translates into a lack of review and accuracy checks on customer data, product information, and financial accounts. (Incidentally, this is, in and of itself, a form of operational risk).
Also, without MDM, there’s often confusion about which system has the authoritative master data. Confusion means some teams will choose the wrong source. And given that master data is shared, this poor quality data will continue to create more low-quality data as it is shared.

Do we know of situations where poor quality master data affected our organization’s operations? What is the frequency of these problems?
Where were the issues: In-bound logistics? Operations? Distribution? Sales/marketing? Services?
Do we know what the root cause of the errors was? Were they tied to poor operational controls around the master data?
Did the organization suffer any reputational damage in the eyes of our customers, suppliers, distributors, or employees?
Did the failure create any financial issues for the firm? Did supplier terms change? Did customers demand deeper discounts? Did this change customer/supplier/employee retention?
How do our failure rates compare to our competitors? Industry benchmarks? If we improve beyond the benchmark, can it be used as a point of differentiation?

Do we know if situations where poor quality master data has affected or organization's strategic planning/budgeting process?

Fourth Challenge – Choosing the Right MDM Tool

We need to ensure that the following points are realized.

✓MDM/RDM can help an organization avoid costs, reduce risks and create value
✓The 4-Rs (reconciliation, reporting, regulatory compliance … and risk management) are all areas to examine when building business cases
✓Not all value (or costs) are quantifiable—understanding qualitative value often just as important as quantifying cost savings.

Selecting The Right Tool

Modular MDM Solution Vs Unified MDM Solution 

Master Data is the most important data for any enterprise. And as rightly defined in Wikipedia: In business, master data management (MDM) comprises the processes, governance, policies, standards, and tools that consistently define and manage the critical data of an organization to provide a single point of reference.

The concept of MDM has been in the industry for almost a decade now. I remember the initial days of MDM. It was all about Customer Data and Product data as two separate domains. There were two different originating points. Customer Data Integration (CDI) and Product Information Management (PIM) [That's what my older projects were known as]. Just like wine, with time the concept has matured and has been bettered in so many ways. People (read CTO/CFO/COO/Architects/Project Leads) started understanding the MDM ecosystem and the challenges that come along with it.

It didn't take forever to realize that just having a match & merge module or just having workflows for creating a golden record is not going to be enough. Hence the modern-day MDM tools started evolving themselves and started addressing each and every aspect of MDM.
  • Data consolidation via match & merge.
  • Granular survivorship capability.
  • Excellent UX for stewardship
  • Data governance via workflows and identity management.
  • Data security at various levels. Data lineage.
  • Hierarchical representation of data.
  • The capability of customizing UI for rich UX.
  • Data distribution patterns using data objects, web services, REST, SQL. Batch etc.
  • Reporting capabilities
  • Integration capabilities with legacy and modern systems
  • Cloud-based
  • Scalability and high availability.

The MDM product companies aka product vendors now have this vision. Thanks to either visionary in their team or thanks to the analyst firms who score them based on completeness. The question was how to be the most complete tool. How to have all the checkboxes ticked in order to be the most sought after "Enterprise Master Data Solution".

The race started a few years back and the goal was clear. There was no rule but the message was loud and clear "If you can't evolve you got to lose". Darwin's survival of fittest has now found a new meaning. By this time there were two major groups of MDM product vendors. "The Pioneers" and "The Challengers". In the era of PIM and CDI, pioneers were also the leaders. But the new wave of finding mature, complete, and smart MDM tools for the enterprise has erased the demarcation between the Pioneers and challengers and made the market more competitive. Two types of players emerged out of this competition. One who has the money to acquire or merge in order to overcome its deficiencies. And the other kind who has the intelligence and skill to develop and equip itself with features they used to think was not required.

It is a win-win situation for the consumers or the business enterprise. All they need to decide is whom do they want.
Do they want:
a) Companies that provide complete Enterprise package and have dedicated experts or SMEs for each module.
b) Companies that provide unified solution and have dedicated experts or SME's for the single solution.

The game is quite simple now. If you want to lure the client you should be able to impress the techies or the business users using the tool. The game is now being played in the conference room pilot and not in the golf ground. There are some soft corner and legacy driven decision that still takes place but in the long run, those decisions seldom give up under pressure and expectation.

In order to select the right product, there are two broad criteria.
  1. Based on Domain
  2. Based on the Implementation Pattern

Based on Domain 

We need to select a product/solution which can seamlessly handle multiple domains either business or technical. 

Business domains are like - 
  • Customer
  • Vendor
  • Asset
  • Product
  • Location
  • Pricing etc.
Technical domains are like - 
  • Master Data
  • Meta Data
  • Reference Data
  • Transactional Data
We need to pick a tool that must address multi-domain and not multiple domains. You don't want your IT team to undergo the painful process of implementing separate MDM for each domain. 

I have mentioned the same in one of my previous blogs which I published last year (2012) as to why Multi-Domain MDM is so important.

Based on the Implementation Pattern

As explained in the architectural section above. We must identify a solution that should support all or at least almost all types.

So, keep an eye on a solution that supports a multi-domain and multi-implementation strategy.

Fifth/Final Challenge – Implementation

While implementing an MDM Solution we need to focus on the fact that it should lay a strong foundation for operational excellence by implementing an agile and competitive solution that will ensure that you have quality access to master reference and master data domains.

  • Establish master data priorities as part of the larger data and technology strategy
  • Embraced the master data standards throughout the organization
  • Enabling governance by mandating a stewardship process for maintaining master and reference data using workflows.
  • Implement the change process consistently across the enterprise
  • Establish roles and responsibilities using data integrity.

Kinshuk Dutta

Scala & Spark for Managing & Analyzing Big Data (Using Machine Learning)

Managing & Analyzing Big Data using Apache Scala & Apache Spark In this blog we will see how to use Scala and Spark to analyze Big D...