This organization had started a strategic program to massively increase visibility of all its data including in early research carried out in various labs across Europe and the US. One goal was to increase controlled access and reuse of data (namely by AI and Data Science). Another was to better understand information flows and their supporting systems in order to improve information governance (including personally identifiable data).
A corporate goal for “data democratization” that had led to significant investments in data lakehouses was dependent on better data governance and visibility to deliver value.
The client conducted a quick product selection and recommended leveraging the industry leading product Collibra . This enterprise platform would ultimately scan all data stores (i.e. hundreds of databases, applications and other data stores) to discover data objects and assets and it would need to be kept in a validated state.
The preferred delivery approach was agile and the sponsor was the data science group. As the goal was a single enterprise wide solution for data cataloguing and data governance, with future economies of scale, the organisation needed to be confident that, whenever decisions between speed of delivery and scalability were needed, the right balance would be met. Subsequent capability increments, required by other business units should not impact the existing users.
So a vision and roadmap, recommended guardrails, overall principles and the data and technology patterns to use in the chosen platform was requested by enterprise architecture. An outline of the operating model for the platform (based on the emerging product driven delivery approach) was also needed.
The organisation had grown fast and made a number of tactical deployments of solutions for managing employee data, organizational data, systems, projects and infrastructure assets. So a number of other systems of record for applications, technology, organizations, processes, change initiatives, etc were also being considered for related areas.
(Case Study)
Solutions proposed
The following technologies were considered:
- Alation
- Collibra
- Informatica MDM
- AWS Glue
- ServiceNow based data discovery solution
For Collibra, two deployment options were considered:
- deployment on premise (private cloud)
- SaaS (with AWS and GCloud options)
Client contribution
The client engaged a business analyst that helped identify the key use cases. They also engaged a small US-based consulting company to make its product selection and the deployment of the platform in a validated state.
They were also responsible for the definition of the target processes and for prioritization of use cases.
Outline
The client wanted to use a product based iterative delivery approach. About 25% of the resource on the initiative were client employees. New processes were being defined by a separate enterprise data management program, as part of defining this new capability. The same program was responsible for identifying all the corporate data assets.
The client had multiple datastores technologies including AWS S3, AWS RDS, Snowflake, Databricks, Veeva, Azure, and many file based repositories.
Preliminaries
The client had a discovery program to create an initial catalogue covering all the locations and types of hosting of data stores.
Also a summary of the current approach to data governance and any findings from regulatory and compliance requirements shortcut some of the steps that would need to be taken to establish the requirements, constraints, etc.
Deliverables and outcomes
Deliverables included: non functional requirements, vision and roadmap, business capability model, data flow diagrams, data models, application target and stage transition architectures, interface catalogue, security model. These views were captured in documents like the Conceptual and Logical Architecture and the Technology Architecture.
We also completed a report with recommendations on the operating model, liaising with the client security and data privacy experts.
The main outcomes were:
- a validated Data Governance and Data Management platform
- a roadmap aligned to related enterprise wide capabilities and that the product delivery team could use as guidance
Requirements
The main objectives were to enable a global platform, fast deployment, low cost of operation and to enable better enterprise data management, both from a value generating and a regulatory compliance viewpoint.
Next steps
Subsequently to the qualification of the infrastructure and operational phase, further alignment of the IT4IT toolset was required, like:
- Integration of the ServiceNow platform with the DGDM platform
- Integration of the ServiceNow platform with the DGDM platform
- The data privacy program also built on the catalogue to support compliance audits
Notes
The overall program costs were underestimated and there was a lack of accessible Collibra professional services with expertise in the industry, which was a success factor for adoption beyond the Data Science group that was sponsoring the platform.
It would have been more beneficial to develop a more complete architecture and roadmap earlier. This would provide more clarify as to which data assets would be discovered and curated on each sprint. It would also reduce the cost of maintaining validation after go-live.
The approach taken to qualify the platform did not cover all the requirements, leading to multiple iterations.
Feedback
The client praised the way we brought together industry knowledge and expertise to define architecture for enterprise grade technologies.
The client was very pleased with the vision and engaged us in similar architecture definition on other enabling platforms like the ones supporting application catalogues, data privacy compliance and infrastructure management.
Notes: Key factors for fast engagement include appropriate detail of planning including availability of key resources, prior identification of relevant accelerators and our flexibility to adapt to your ways of working.
Our use case descriptions may combine outputs and benefits from discrete engagements, even if typically in the same client.