Description of Bluemetrix Data Manager
Bluemetrix's flagship management application, BDM Control, is a suite of data and governance control capabilities, which integrate with your data and governance processes to create a single view of your data governance and when applied to your data will apply, capture and extract the data access and governance enforcement data from your pipelines ad auto-populate your governance tools, ensuring they are up to date at all times.
BDM allows a non-technical resource to build, schedule, transform, ingest and manage data pipelines inside Hadoop without having to write any code or know the underlying Hadoop environment. It applies automation to a range of different tasks so that the necessary code and commands are created and deployed as required. BDM fully compliments the Hadoop
ecosystem and creates no proprietary code.
It works exclusively on the Spark environment within Hadoop.
BDM is a framework for the Ingestion, Masking, Translation, Transformation, Governance, Validation, Management and Quality Assurance of Data on Hadoop.
Data Ingest
● Simple template-based Connector system for all data sources
● Multiple Connectors available
● No need to develop any ingest code or select appropriate Hadoop components
● New data sources can be deployed in hours rather than weeks or months
● Storage can be selected to suit the data type and processing requirement i.e. HIVE, HBase, etc.
● No extra code is developed, reducing the code release cycle time and complexity
Data Masking/Tokenization
● Data Masking is available on ingest to the cluster;
● It can be carried out on a column or table basis
● Stateful and Stateless Tokenization solutions are available
● Different masking algorithms can be applied to suit the data i.e.
⮚ Complete removal of selected columns
⮚ Replace values with random data
⮚ Add a random value to each row in the table
⮚ Categorize data e.g. exact salary replaced with a range
⮚ Geolocation data – apply rotation methods to mask the data
Data Quality & Validation
● Data Consistency is guaranteed by applying checksums and other controls on the data
● Data Integrity is provided by Regular Expression and ML algorithms
● All quality data is accessible through a dashboard which will provide a snapshot of the health of the data on the cluster
Data Transformation
● Data transformations are coded and stored in a custom library deployed in Spark
● Data maps/flows can be created using a drag and drop interface
● Dramatic reduction in code developed and deployed
● Dramatic reduction in scripts developed
● No requirement for SQL skills or HIVE knowledge to transform the data
● No requirement for Spark expertise to create transformations
● An API can be provided to the Spark library allowing client developers create and deploy their own Spark transformations
Data Governance & Lineage
● All data governance capabilities – Audit, Change Tracking, etc. – are built into Atlas
● Governance functionality can be easily customized to add new data and features i.e. addition of new
GDPR compliance tags, etc.
● Process is completely independent of the end user and happens in the background
● Only solution with end-to-end data governance enabled on Atlas available in the market today
As one of the first companies to use Hadoop in Europe in 2009, and since 2016 we have carried out over 400 Hadoop Big Data implementations across all major enterprises in Europe in all industry sectors – Automotive, Finance, Insurance, Healthcare, Retail, Government, etc. These projects cover the full spectrum of activities from Architecture, Design, Development, Infrastructure, Security, Implementation to Operations.
Hide