|
Best Practice in Data Management |
144 |

Best Practice in
Data Management

Barry Williams
Chapter 3. Data Architectures for the Future 42
Chapter 4. Data Management – Getting Started 51
Chapter 5. Data Warehouses and Data Marts 56
Chapter 6. Design Patterns for Data Models 67
Chapter 7. Enterprise Data Models 81
Chapter 8. Knowledge Management 92
First Edition: London, 2011
ISBN : 978-1-4660-0746-8
Barry Williams Principal Consultant Database Answers Ltd. London, England
This is a collection of Essays on Best Practice in Data Management.
Data Management is like a ‘Slowly-Changing Dimension’
It changes imperceptibly and then after about a year (on the average) you realize that the landscape has changed.
Our intention in this book is to capture and define Best Practice at a particular point in time and then keep it up-to-date with new versions of the Book every quarter.
We have started with the Topics that we find come up most often in questions to our Database Answers Web Site.
I hope you enjoy this Book and would be very pleased to have your comments at barryw@databaseanswers.org.
Also, please contact us if there is a topic that you would like to see included.
Data Management covers a wide range of Topics and each of them is the subject of discussion in many organisations trying to clarify the ‘As-Is’, the ‘To-Be’ and how to get from the first to the second.
Our intention with this book is to provide a reference for the selected Topics in Data Management.
You will learn:
An understanding of what each Topic involves.
Some Templates to use to get started
Which organisations and individuals provide ‘Thought Leadership’
In this book, we cover some important topics in Data Management, including :-
Business Intelligence (BI)
Checking the Quality of a Data Model
Data Warehouses and Data Marts
Knowledge Management
Metadata Management
Here is the link for LinkedIn Group on BI :-
This Chapter provides a Road Map for the implementation of a BI Architecture.
This diagram shows the four Layers in the Architecture, where each Layer is supported with a set of Best Practice specifications, Data Models and Templates.
BI
and Reporting Layer




Data
Warehouse and Mart Layer Operational
Data Store Layer Etc. CRM

Data
Integration Layer
– Single
Version of the Truth –
Enterprise Data Model -
ETL Operations Billing

This table shows the deliverables and artefacts at each Stage in the end-to-end BI Architecture.
Business
Requirements and Design Walkthrough
BI Architecture

Deliverables
- Best Practice and Templates

Report Templates
Reporting
Layer



Generic
Design and Models for Data Marts
Data
Mart Layer



Data
Warehouse
Data
Transformation - Mapping to DWH in line with Enterprise Data Model
Data
Integration Layer

Data
Quality (Validation and Clean-up)
Data
Profile

Data
Staging Area
Operational
Data Store Layer

Best Practice involves identifying the User Requirements, formalising these in a document which is agreed to and signed off before development work begins
BI and Reports take data from Data Marts and many of the same considerations apply when it comes to determining Best Practice.
One difference is that is necessary to have a clearer understanding of the business operations and how the right kind of Performance Reports can provide insight to the business users.
This leads to the need for a management education process to be in place so that the evolution of Performance Reports can be planned in a logical manner, from basic summaries, to KPIs, Dashboards and so on.
This Data Models shows the Data Warehouse for the FBP 3 Project. It is called a Reporting Database for consistency with the previously existing Project documentation.

This Template is taken from a Sales Audit Report
|
Sales and Refunds for April 1st. 2011 |
|||||||||||||||||||||||
|
Excluding Settled Transactions |
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
TRAN Date |
|
Reg |
Txn |
Tran Type |
User ID |
|
Amount |
|
|
||||||||||||||
|
6001760501001907996 |
27/06/11 |
50100 |
2 |
7996 |
Issue |
hlobo |
10.00 |
|
|||||||||||||||
|
6001760501006665185 |
27/06/11 |
50100 |
2 |
5185 |
Issue |
hlobo |
50.00 |
|
|||||||||||||||
|
6001760501006665193 |
27/06/11 |
50100 |
2 |
5193 |
Issue |
hlobo |
26.00 |
|
|||||||||||||||
|
6001760501006665201 |
27/06/11 |
50100 |
2 |
5201 |
Issue |
hlobo |
10.00 |
|
|||||||||||||||
|
6001760501006665219 |
27/06/11 |
50100 |
2 |
5219 |
Issue |
hlobo |
20.00 |
|
|||||||||||||||
|
6001760501006665227 |
27/06/11 |
50100 |
2 |
5227 |
Issue |
hlobo |
20.00 |
|
|||||||||||||||
|
6001760501007852014 |
27/06/11 |
50100 |
8 |
2014 |
Issue |
masmith |
6.00 |
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
6001760501008109703 |
27/06/11 |
50100 |
8 |
9703 |
Issue |
masmith |
5.99 |
|
|||||||||||||||
|
6001760501008109893 |
27/06/11 |
50100 |
8 |
9893 |
Issue |
masmith |
5.99 |
|
|||||||||||||||
|
6001760501010717113 |
27/06/11 |
50100 |
2 |
7113 |
Issue |
hlobo |
23.98 |
|
|||||||||||||||
|
6001760501011277653 |
27/06/11 |
50301 |
1 |
2166 |
AdjustDec |
syassir |
-11.50 |
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
6001760501011566451 |
27/06/11 |
50100 |
5 |
6451 |
Issue |
hchiwere |
10.00 |
|
|||||||||||||||
|
6001760514000081507 |
27/06/11 |
50100 |
2 |
1507 |
Issue |
hlobo |
26.00 |
|
|||||||||||||||
|
6001760514000081515 |
27/06/11 |
50100 |
2 |
1515 |
Issue |
hlobo |
26.00 |
|
|||||||||||||||
|
6001760514000081523 |
27/06/11 |
50100 |
2 |
1523 |
Issue |
hlobo |
41.00 |
|
|||||||||||||||
|
6001760514000081549 |
27/06/11 |
50100 |
2 |
1549 |
Issue |
hlobo |
26.00 |
|
|||||||||||||||
This Stage produces and delivers Performance Reports for management
Report Templates supported by the appropriate Generic software are required.
This Stage produces and delivers BI and Performance Reports to management :-
It must be responsive to requests for change.
Users requirements are always evolving
Therefore the approach and supporting software must be flexible
Report Templates supported by the appropriate Generic software are required.
This Stage delivers Performance Reports that meet the requirements of all levels of management.
There is a need to be responsive to requests for change.
User Requirements are always evolving.
Therefore the approach and reporting software tool must be flexible.
A sensible approach is to develop Reporting Templates supported by the appropriate Generic Software.
|
Template Name |
|
||||
|
Date |
|
||||
|
User Category |
Weekly Totals |
Traffic Lights |
Dashboards |
KPIs |
Other |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Template Name |
Report Maturity Level |
||||
|
Date |
April 1st. 2010 |
||||
|
User Category |
Weekly Totals |
Traffic Lights |
Dashboards |
KPIs |
Other |
|
Finance |
Common |
In use |
In use |
In use |
Mashups |
|
HR |
Common |
Aware |
Unaware |
Aware |
|
|
Operations |
Common |
Unaware |
Unaware |
Unaware |
|
|
|
|
|
|
|
|
This Template is used to keep track of the availability of Master Data Models and Data Marts.
|
Template Name |
|
||
|
Date |
|
||
|
Category |
Master Data Models |
Data Marts |
|
|
Customers |
|
|
|
|
Finance |
|
|
|
|
Products |
|
|
|
|
Purchase Orders |
|
|
|
|
Stores |
|
|
|
|
Warehouses |
|
|
|
|
Template Name |
Data Model Availability |
||
|
Date |
March 18th. 2010 |
||
|
Category |
Master Data Models |
Data Marts |
|
|
Finance |
N/A |
N/A |
|
|
HR |
N/A |
N/A |
|
|
Operations |
N/A |
N/A |
|
|
|
|
|
|
|
Movements |
N/A |
N/A |
|
|
NCTS |
Available |
N/A |
|
|
Products |
SEED |
N/A |
|
|
Customer |
DTR but needs work |
N/A |
|
|
Warehouses |
DTR but needs work |
N/A |
|
|
Report Name |
|
||||
|
Date Produced |
|
||||
|
Product Name |
Week 1 Date |
Week 2 Date |
Week 3 Date |
Week 4 Date |
Total |
|
|
<Value in £’s> |
<Value in £’s> |
<Value in £’s> |
<Value in £’s> |
<Value in £’s> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Weekly Totals |
<Value in £’s> |
<Value in £’s> |
<Value in £’s> |
<Value in £’s> |
<Value in £’s> |
|
Grand Total |
|
|
|
|
<Value in £’s> |
These figures are fictitious.
|
Report Name |
Value of Weekly product Movements |
||||
|
Date Produced |
March 18th. 2010 |
||||
|
Product Name |
Dec 6th 2009 |
Dec 13th 2009 |
Dec 20th 2009 |
Dec 27th 2009 |
Total |
|
Beer |
£40,000 |
£60,000 |
£70,000 |
£80,000 |
£160,000 |
|
Cigarettes |
£50,000 |
£60,000 |
£70,000 |
£80,000 |
£160,000 |
|
Cigars & cigarillos |
£25,000 |
£30,000 |
£31,000 |
£32,000 |
£118,000 |
|
Leaded Petrol |
£90,000 |
£91,000 |
£92,000 |
£93,000 |
£366,000 |
|
Unleaded Petrol |
£100,000 |
£120,000 |
£133,000 |
£140,000 |
£490,000 |
|
|
|
|
|
|
|
|
Weekly Totals |
£205,000 |
£361,000 |
£396,000 |
£425,000 |
|
|
Grand Total |
|
|
|
|
£1,300,000 |
These questions are from this page :-
The value and benefits of Reports are always a major part of the justification of the cost of designing and installing a Database.
Here is a short Tutorial
Step 1. Determine if Users are ready for KPIs,Traffic Lights and Dashboards.
Step 2. Check availability of Master Data Models
Step 3. Check availability of Data Marts
Step 5. Check availability of Report Specifications and SQL Views for Reports
Step 6. Perform ‘Gap Analysis’ to identify any missing data that must be sourced.
Step 7. Analyse common aspects of requirements for Performance Reports
There are three Templates in this Section :-
User Report Maturity Level
Availability of Master Data Models and Data Marts
Templates for Performance Reports
These questions are taken from this page :-
Here's another Kick-Start Tutorial :-
Step 1. Assess the level of Maturity of the Users concerning KPIs, Dashboards,etc..
Step 2. Check availability of Master Data Models and Data Marts
Step 3. Check availability of Report Specifications and SQL Views for Reports
Step 5. Tailor the Approach accordingly
Step 6. Aim for results in 6 months and interim results in 3 months
If you have a Question that is not addressed here, please feel free to email us your Question at barryw@databaseanswers.org.
Check for :-
a Statement of User Requirements
ideally with specifications of Templates
Software Design Patterns.
Data in Excel Spreadsheets is structured in tabular format which corresponds exactly to the way in which data is stored in relational database.
Also Spreadsheets are commonly used and the data frequently needs to be integrated with other data within an organization.
Therefore we would expect to find a wide range of solutions are available to solve this problem.
Here is a small sample :-
• An ODBC connection can be established for a spreadsheet.
• Informatica allows Spreadsheets to be defined as a Data Source.
• Microsoft’s SQL Server Integrated Services also lets Excel be defined as a Source.
• Oracle provides a facility to define EXTERNAL table which can be Spreadsheets.
• Salesforce.com provides their Excel Communicator.
In order to always respond to this situation appropriately, it is necessary to have an Information Catalogue, a Data Architecture and Data Lineage.
The solution then involves the following Steps :-
Step 1) Produce a draft Report for the Chief Execs approval
Step 2) Trace the lineage and perform a ‘gap analysis’ for all new data items.
Step 3) Talk to the Data Owners and establish when and how the data can be made available.
Step 4) Produce a Plan and timescale
Step 5) Review your Plan with the Chief Exec and obtain this agreement and formal sign-off.
Step 6) Deliver !!!
Reports for Senior Management fall into two categories :-
Standard Reports
On-demand reports
For Standard Reports it is possible to define Templates.
For On-demand Reports, the aim is to define a flexible approach to be able to respond to changes to Requirements in a timely manner.
The key action here is to establish a unified Reporting Data Platform.
This will involve aspects previously discussed, including MDM, CMI and will certainly involve Data Lineage.
Senior Management will want to take a view of the integrated data and not focus on details of derivation.
Therefore, we have to follow the MDM approach with Data Lineage for each item in the Integrated Performance Reports.
Key Performance Indicators (‘KPIs’)
Question : What are Key Performance Indicators (‘KPIs’)
Key Performance Indicators (‘KPIs’) are in common use and represent one aspect of Best Practice.
A variation of this approach are Key Quality Indicators,(‘KQIs’) which are used to monitor and manage Data Quality.
Dashboards and Scorecards are often used in association with KPIs and KQIs.
This diagram shows the four major Stages in delivering a ‘Single View of the Truth’ :-
An Enterprise Data Warehouse is a repository of all the data within the enterprise that is used in Performance Reports.
It is intended to guarantee a ‘Single Version of the Truth’ .
It typically stores very detailed data, such as Sales Receipts, and can maintain historical data going back many years.
A Data Mart, on the other hands, stores data for specific functional areas, such as Purchases and Sales, and the data is usually limited in timescale and might even have a limited life span.
A Data Warehouse can be either a single very large Repository of Data or it can be built as an interlocking set of Data Marts.
Each Data Mart would store data related to a separate business area, such as Sales, or for a specific Report or family of Reports.
In passing, we can mention that there are two well-known authorities in the broad field of Data Warehouses and Data Marts.
The first is Bill Inmon, who favours the first approach of large, all-encompassing Data Warehouses.
The second is Ralph Kimball, who favours related Data Marts.
Inmon and Kimball both write well and present convincing arguments for their points of view.
A sensible approach is to start with a single Data Warehouse and then to create Data Marts for specific business requirements as they occur.
In order to link Data Marts, they need to share the same values for the same Dimensions, such as Stores or Products. These are called ‘Conformed Dimensions’.
Without Conformed Dimensions, it is impossible to compare and accumulate related values in Data Marts.
An Agile Approach is very important because it is inevitable that user requirements will change from time to time.
We can predict three Phase in the evolution of User Requirements :-
‘Give me everything’
Give me these Reports on a regular basis and give me an ad-hoc Enquiry facility.
I want integrated KPIs and Dashboards
I want to be notified automatically if I have any situation requiring urgent attention
In order to meet these Requirements, we need to put in place an Agile Data Warehouse with flexible Data Marts and an integrated BI Toolkit.
A Conformed Dimension is one that has the same value across all Subject Areas.
Conformed Dimensions are therefore often Master Data.
Although, of course, a Conformed Dimension is not necessarily Master Data,
The best example is probably Dates.
In
this example, the Date field is a Conformed Dimension for the
Purchasing and Sales Data Marts, but Suppliers and Stores are not.
Ticket
Dimension -Ticket
Number
Purchasing
Data Mart -Date PO
Issued -Supplier ID
-PO ID Suppliers
Dimension – Supplier ID Calendar
Dimension - Date Sales
Data Mart -Date of Sale
-Store Number Stores
Dimension – Store Number





This table conveys the levels of conformance within a Dimension by grouping the base Dimension with conformed rollup totals.
The two let-hand columns are Dimensions and the top column shows Facts.
The ‘Yes’ fields indicate the Dimensions that have to be conformed in order for the analyses to be valid.
They show that if we have Product-level data then the Product is a conformed dimension.
They are illustrative for discussion purposes.
|
|
|
Orders |
Shipments |
Inventories |
Sales |
Returns |
Demand Forecast |
|
Date |
Day |
|
Yes |
|
Yes |
Yes |
|
|
|
Week |
Yes |
|
Yes |
|
|
Yes |
|
|
Month |
|
|
|
|
|
|
|
Product |
Product |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
|
|
Category |
Yes |
Yes |
|
|
|
Yes |
|
Organisation |
Warehouse |
|
Yes |
|
|
|
|
|
|
Store |
|
|
Yes |
Yes |
Yes |
|
|
|
Division |
|
|
|
|
|
|
This shows how the data in the simple example Data Warehouse can be made available in two separate Data Marts – one for Gift Cards data and the other for Damaged Goods data.

This shows the design for three representative Online Shopping Data Marts

This design shows two date fields because that is a common pattern.
However, there could be more or less than two.
In a similar way, it shows six Dimensions and six Facts for illustrative purposes but these, of course, could be any number of attributes.


What – is a Data Mart ?
A Repository of total and detailed data with a standard structure
This structure is usually a Facts Table where all the data for analysis is held, together with a number of associated Dimension Tables.
Generic software is used, support by common Report Templates
How – do we get Started ?
Step 1. Understand the User’s Data Requirements
Step 2. Determine the available Data
Step 3. Reconcile standards, reference data
Step 5. Establish a common view of the Data Platform
Step 6. Choose the product or use bespoke SQL
Step 7. Design the Templates and agree design with Users
Step 8. Populate the Templates with sample data
These questions are from this page :-
Data Marts are a Repository of summary, total and detailed data to meet User Requirements for Reports.
They always have a standard structure, called Dimensional Data Models, which means that it is possible to use Generic Software and adopt a common Approach based on Templates.
Describing a Data Mart is a good way to get User buy-in because they can easily be explained in a logical manner which is very user-friendly.
A Data Mart is a Repository of total and detailed data to meet User Reports requirements.
It always a standard structure which means can have generic software and a common approach based on Report Templates
A Data Mart design is simple and can be described to get User buy-in
Data Marts have a common design of Dimension fields and Facts.
Templates are important because they represent a tremendous Kick-Start approach to the design of Data Marts for a specific business area.
They are produced by exploiting the common design of Dimensions and Facts.
A range of Data Mart diagrams is available in the Case Studies on the Database Answers Web Site.
It provides a single point of reference for all the data available within the organisation for producing Reports
These questions are form this page :-
To get started, follow these Steps :-
Get a broad understanding of User’s Data Requirements
Establish a common view of the Data Platform
Determine the available Data
Reconcile standards, reference data
Choose the product or use bespoke SQL
Use Templates and agree design with Users
Populate Templates with sample data
Get sign-off on demo specs in 1 month, aim for results for champion in 3 months and final results in 6 months.
Adjust timescales in light of experience
Check the level of Users understanding.
Check for existence of Templates.
Every DBMS produces what is called an Execution Plan for every SELECT statement.
The steps to improving the performance involve checking this Execution Plan against the Indexes that exist, and making sure that the Query Optimizer has used the appropriate Indexes to obtain the best performance.
This is a specialized area where DBA’s spend a lot of their time when they are looking after production databases where speed is a mission-critical factor.
Data Marts are always created to support Business Intelligence, which includes Performance Reports, Balanced Scorecards, Dashboards, Key Performance Indicators and so on.
Best practice always requires user involvement and a generic design to support a flexible approach to meeting changing requirements.
Users will always want changes to their first specifications of their requirements.
The insight that they obtain from the first Reports helps them identify more precisely what their long-term requirements will be.
Therefore flexibility is important.
A well-designed Data Mart will anticipate the areas where flexibility is required.
The design process should always follow two steps :-
• Production of generic design for the Data Mart
• Implementation of the design with a specific Data Mart software product.
This Layer contains four major areas of functionality :-
* The Data Warehouse
* Enterprise Data Model
* Master Data Management / Single Version of the Truth
* ETL (Extract, Transform and Load)
This offers a Design Pattern that can apply to any high-level System or Business Function that is providing data to be loaded into the Data Warehouse.
CUSTOMERS

EVENTS
PRODUCTS
/ SERVICES
ORGANISATION








SUPPLIERS
DOCUMENTS
This shows a simple example of a Data Warehouse holding data for Gift Cards and Vendor Compliance Infractions involving Damaged Goods all the data available in one single source.

This diagram shows the Entities that contribute to the Data Landscape for the Warehouse.

This shows the design of the Generic DWH

This diagram shows the Generic Design above applied to Gift Card and Vendor Compliance data in a Retail organisation, (shown in red).

This involves establishing Master Catalogues for Customers, Products and Stores.
The Data Model for the Customer Master Catalogue is shown here and is described in more detail in a separate document.
This shows the tables involved in maintaining a Customer Master Index.

This shows validation for the Event in the EDM
|
DATA ITEM |
TYPE |
VALIDATION |
COMMENT |
|
event_id |
Integer |
Unique Internal Identifier for each specific Customer Event. |
For example, a Customer makes an Appointment. |
|
Event Reference Number |
Text (15) |
Unique External ID for each specific Event. Optional and cannot be validated. |
For example, a Housing Repair Job Number. |
|
Event Type |
Text (15) |
Reference Data. |
For example, ‘Make an Appointment’. |
|
Associate |
Integer |
Link to Associate Table. |
|
|
Contact |
Integer |
Link to Contact table. |
|
|
Establishment |
Integer |
Reference Data. |
For example, a Court for Youth Offenders or a School for Pupils. |
|
Staff |
Integer |
Optional, but links to Staff Table if specified. |
For example, a Social Worker. |
|
Supplier |
Integer |
Optional but links to Supplier Table if specified. |
For example, a Housing Contractor for repairs. |
|
Event Address |
Integer |
Optional or specified Address. |
For example, where an Offence took place. |
|
Event Outcome |
Text (15) |
Reference Data. |
For example, ‘Satisfactory’. |
|
Event Status |
Text (15) |
Reference Data. |
For example, ‘Cancelled’, ‘Pending’. |
|
Event Start Date & time |
Date |
Mandatory, < Today and Now. |
Mandatory |
|
Event End Date & time |
Date |
Validation is > Start Date |
Optional |
|
Source Table |
EDM Entity |
Comment |
|
Activity |
Event |
For example, send a Letter or carry out an Investigation. |
|
Aspect |
Event |
These are Issues |
|
Complaint |
Event |
Contains repeated Options for Gender, Handling Investigator, Stages etc.. |
|
Complaint_People |
Customer |
Includes Complainants and Contacts, such as Edwina Currie. |
|
Contact |
Contact |
People contacted with regard to Complaints. |
|
Cost |
Event_Notes |
For example, Compensation to a Complainant. |
|
PersonInv |
Customer |
Includes non-Customers, eg Contacts who have not made Complaints. |
|
Letters |
Documents |
|
|
User |
Staff |
|
|
UserGroups |
Team |
Teams of Staff, equivalent to Teams of Social Workers. |
|
DATA ITEM |
DESCRIPTION |
MIN VALUE |
MAX VALUE |
MOST COMMON VALUE |
COMMENTS |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DATA ITEM |
DESCRIPTION |
MINVAL |
MAX VALUE |
MOST COMMON VALUE |
COMMENTS |
|
Withdrawn Date |
Date Customer’s Approval withdrawn |
Dec-31-1998 |
Jan-1-2010 |
Jun-15-2008 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DATA ITEM |
DESCRIPTION |
Nullable |
RULES |
DATE |
% QUALITY |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DATA ITEM |
DESCRIPTION |
Nullable |
RULES |
DATE |
% QUALITY |
|
Withdrawn Date |
Date Customer’s Approval is withdrawn |
Yes |
>Start Date |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ETL Transformations |
|
||||||
|
Project Title |
|
||||||
|
Known As: |
|
||||||
|
Development End date: |
|
||||||
|
Additional Comments: |
|
|
|
||||
|
Trigger |
|
|
|
|
|
||
|
Source (eg Table) |
Data Item |
Data Type |
Target (eg XML File) |
Data Item |
Data Type |
Job Schedule |
Rule Specification |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specifications taken from migrating sample Customer data.
|
Mapping Specifications |
|
|||||
|
Project Title: Creation of a Data Extract for Customers |
|
|||||
|
Date: April 1st. 2010 |
|
|||||
|
Additional Comments: These Specifications are subject to review by Stakeholders. |
|
|
||||
|
Trigger |
When CUSTOMERS.DAT_VAL = SYSDATE |
|
|
|
||
|
Source |
(Include DB type and name) |
Data Type |
Field Name |
Data Type |
Transf Rule |
|
|
Table |
Column |
|
Table |
Column |
|
|
|
|
|
|
|
|
|
|
|
CUSTOMERS |
ID |
NVARCHAR2(8) |
OFFICE |
Office Unique ID |
CHAR(8) |
|
|
CUSTOMERS |
DAT_VALID |
DATE |
|
|
|
|
|
CUSTOMERS |
PHON_NUMBER |
NVARCHAR2(35) |
|
|
|
|
|
CUSTOMERS |
FAX_NUMBER |
NVARCHAR2(35) |
|
|
|
|
|
CUSTOMERS |
TELEX_NUMBER |
NVARCHAR2(35) |
|
|
|
|
|
CUSTOMERS |
E_MAIL_ADDRESS |
NVARCHAR2(70) |
|
|
|
|
|
CUSTOMERS |
COUNTRY_ID |
NVARCHAR2(2) |
OFFICE |
Country Code |
CHAR(2) |
Copy As is |
|
CUSTOMERS |
TRADING_ROLE |
NVARCHAR2(1) |
|
|
|
|
|
CUSTOMERS |
POST_CODE |
NVARCHAR2(9) |
|
|
|
|
|
CUSTOMERS |
REG_CODE |
NVARCHAR2(3) |
|
|
|
|
|
CUSTOMERS |
GEO_INF_CODE |
NVARCHAR2(8) |
|
|
|
|
|
|
|
|
|
|
|
|
Data Integration provides a one view of the truth for things of importance to the organisation, such as Customers, Products and Movements.
It includes Data Quality, Master Data Management and mapping specifications.
Here is the Web Link for the Questions on the Database Answers Web Site :-
Data Integration is concerned with combining data from various Sources into one consistent stream.
It provides an essential Single View of Data, for example, a Single View of a Customer.
It also provides a natural point at which Data Quality can be addressed.
At this Stage, Data Quality can be assessed and a Single View of a Customer can be achieved.
When Data Quality is of a uniform good quality, it can be integrated and made available as a consistent View.
This will be supported using a Glossary, as described in the Information Catalog Stage.
The current incarnation of Data Integration is Master Data Management,(MDM).
Data Integration provides a one view of the truth for things of importance to the organisation, such as Traders, Products and Movements.
It provides a natural point at which data quality can be addressed.
When Data is of uniform good quality it can be integrated and made available as a consistent View.
This leads naturally to Master Data Management,(MDM).
Details of the Integration, such as mapping specifications, are held in a Glossary, which is described in Stage 7.
Some key points :-
Data Integration is concerned with combining data from various Sources into one consistent stream.
It provides an essential Single View of Data, for example, a Single View of a Customer.
It also provides a natural point at which Data Quality can be addressed.
At this Stage, Data Quality can be assessed and a Single View of a Customer can be achieved.
When Data Quality is of a uniform good quality, it can be integrated and made available as a consistent View.
This will be supported using a Glossary, as described in the Information Catalog Stage.
The current incarnation of Data Integration is Master Data Management,(MDM).
Data Integration provides a ‘Single View of the Truth’ for the things of importance to the organisation, such as Traders, Products and Movements.
It provides a natural point at which data quality can be addressed.
When Data is of uniform good quality it can be integrated and made available as a consistent View.
This leads naturally to Master Data Management,(MDM).
Details of the Integration, such as mapping specifications, are held in a Glossary, which is described in Stage 7.
One of the major components in Master Data Management (‘MDM’) is Customers.
MDM can be defined a ‘Providing a Single View of the Things of Importance within an organisation’
Master Data Management applies the same principles to all the ‘Things of Interest’ in an organisation.
This can typically include Employees, Products and Suppliers.
We have discussed ‘A Single View of the Customer’ and MDM involves the same kind of operations as a CMI.
That is, identification and removal of duplicates, and putting in place to eliminate duplicates in any
new data loaded into the Databases.
There is a wide choice of software vendors offering MDM products.
De-duplication and Address validation is a niche market in this area.
On the Database Answers Web Site, there is a Tutorial on Getting Started in MDM
There is a sister Web Site devoted to the topic of MDM-As-a-Service
Wikipedia has some useful entries on Conceptual Models, Logical Models and Data Models.
Conceptual Data Models do not conventionally show Foreign Keys and are very useful for
making clear the Entities and Relationships in a Data Model without any Keys or Attributes.
They are very useful for discussing Requirements with Users because they show only the basics.
Logical Data Models add Foreign Keys and Attributes.
They are very useful for publishing a complete statement of the data involved.
Physical Data Models are very close to the Database design.
They are very useful for discussions between the Data Analyst, DBAs and developers.
Wikipedia has an entry on ETL which is worth a look.
ETL stands for Extract, Transform and Load.
• Extract means Extracting data from Data Sources.
• Transform covers many tasks, including –
O Selection of the data of interest
o Validation and clean-up of the selected data
o Changing the format and content of the data
o Loading into the designated Target.
In practice, there are three options for implementing ETL:-
• Develop bespoke SQL
• Use a commercial package, such as Informatica or Microsoft’s Integration Services
• Some combination of these two.
For example, developing basic SQL to clarify the Requirements and
then looking for a commercial product to meet the Requirements.
Data Lineage can be defined as the ability to the trace the derivation of all items of data that
appear in any important Performance Reports and Management Information.
That includes :-
• Who owns the original source data
• What validation and transformations are applied to the data in its life cycle
It provides one view of the truth
It offers a point at which Data Integrity can be measured and User involvement obtained to improve Quality until it meets User standards.
Data Profiling is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started. For example, replace LTD by LIMITED (or vice versa), and ‘&’ by AND.
The Design Approach requires Data Models for the areas of the within Scope.
It will also require Generic Data Models to support one view of the truth for major entities, such as Traders or Customers.
This one view will be implemented as Master Data Management (MDM).
Get a broad understanding of the data available
Establish a common view of the Data Platform
Get a broad understanding of Data Sources
Determine the available Data
Choose the MDM product
Determine strategy for Clouds – e.g. Reference Data available globally
In 1 month, produce Generic Data Models
In 3 months, confirm GDM with sample data and Facilitated Workshops and choose MDM product.
In 6 months, implement MDM and publish GDM and CMI on the Intranet.
Adjust timescales in light of experience
Data Integration covers a number of Steps, each of which can have its own Templates.
Examples are included here for Data Profiling and Mapping Specifications.
Step 1. Start with Data Profiling because it is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started.
For example, replace LTD by LIMITED (or vice versa), and ‘&’ by AND.
Step 2. Determine the available Data Models for major areas of the enterprise.
Step 3. Determine whether Generic Data Models are available to support one view of the truth for major entities, such as Customers or Offices.
This one view approach will be implemented as Master Data Management.
Step 5. Establish a common view of the Data Platform – Reference Data, Customers, products, Movements and so on.
Step 6. Determine the available Data
Step 7. Choose an MDM product or decide on in-house SQL development.
These Steps define a Tutorial of Best Practice :-
Step 1. Define the Target which is usually a ‘Single View Data Model’.
Step 2. Define the Data Sources
Step 3. Define the Mapping Specifications from the Sources to the Target.
Step 5. Define the Data Platform
Step 6. Identify Standards to be followed.
This Tutorial is described in detail in a separate document, entitled Data_Integration_Tutorial.doc
These questions come from this page :-
If you have a Question that is not addressed here, please feel free to email us your Question.
Look for the existence of the following items :-
Generic Data Models
An Enterprise Data Platform
Identify the Data Sources
Selection of a MDM Product
Implementation of a Customer Master Index or appropriate alternative
Data Profiling is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started.
For example, replace LTD by LIMITED (or vice versa), and ‘&’ by AND.
The Design Approach requires Data Models for the areas of the within Scope.
It will also require Generic Data Models to support one view of the truth for major entities, such as Traders or Customers.
This one view will be implemented as Master Data Management (MDM).
Get a broad understanding of the data available
Establish a common view of the Data Platform
Get a broad understanding of Data Sources
Determine the available Data
Choose the MDM product
Determine a Strategy for Clouds – e.g. Reference Data available globally
In 1 month, produce Generic Data Models
In 3 months, confirm GDM with sample data and Facilitated Workshops and choose MDM product.
In 6 months, implement MDM and publish GDM and CMI on the Intranet.
Adjust timescales in light of experience
Data Integration covers a number of Steps, each of which can have its own Templates.
Examples are included here for Data Profiling and Mapping Specifications.
A successful Strategy for Data Quality as an Enterprise Issue must include both organization and technical aspects.
Typical Organization aspects are :-
Commitment from senior management
Establishing the slogan “Data Quality is an Enterprise Issue” as a top-down edict.
Identification of the ‘Top 20’ Applications and Data Owners across the Enterprise
Agree sign-off procedures with Data Owners and Users
Technical aspects
Establish Key Quality Indicators (‘KQIs), for example Duplicate Customers records
Agree target Data Quality percentage
Define KQI Reports and dashboards
Develop SQL to measure KQIs
Define procedures to improve KQIs
This could include Oracle, SQL Server and DB2.
The key to handling multiple types of Database is to thing of them in terms of an Integrated
Data Platforms, where all types of data are presented in a common fashion.
This then defines the logical requirement.
There is a then a number of options to physically meet this logical requirement.
The Enterprise-level option is to use an appropriate commercial product, such as Informatica
Best Practice involves establishing documentation standards.
For example, each table would be preceded by MIR_ to indicate that the structure of the table ‘Mirrors’ the structure of the Data Source.
A Staging Area will need to be established.
|
SOURCES |
CONTACT |
TYPE |
DATA ITEMS |
COMMENTS |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SOURCES |
CONTACT |
TYPE |
DATA ITEMS |
COMMENTS |
|
CRM |
Joe Bloggs |
TBD |
Customers |
‘Golden Source’ |
|
Billing |
Joe Bloggs |
Oracle DB |
Stores |
Official Source |
|
Operations |
|
|
|
|
A Repository for all major Applications, Databases, Spreadsheets and so on.
Data and information related to each Stage in the Best Practice Road Map
This includes details of People, Roles and Responsibilities, Applications, Databases
These questions are from this page :-
Data Sources include all major places where important data is created or used, including :-
Applications
Databases
Spreadsheets
XML files, and so on
It also includes Information related to each Stage in the Best Practice Road Map on People, Roles and Responsibilities.
This Information is stored within an Information Catalog.
A Repository record Data Sources for all major Applications, Databases, Spreadsheets and so on, data and information related to each Stage in the Best Practice Road Map