Excerpt for Best Practice in Data Management by Barry Williams, available in its entirety at Smashwords

Best Practice in Data Management

144


Best Practice in

Data Management



Barry Williams







First Edition: London, 2011

ISBN : 978-1-4660-0746-8



Barry Williams Principal Consultant Database Answers Ltd. London, England

Chapter 1. Welcome

1.1 What is this?

This is a collection of Essays on Best Practice in Data Management.

Data Management is like a ‘Slowly-Changing Dimension’

It changes imperceptibly and then after about a year (on the average) you realize that the landscape has changed.

Our intention in this book is to capture and define Best Practice at a particular point in time and then keep it up-to-date with new versions of the Book every quarter.

We have started with the Topics that we find come up most often in questions to our Database Answers Web Site.

I hope you enjoy this Book and would be very pleased to have your comments at barryw@databaseanswers.org.

Also, please contact us if there is a topic that you would like to see included.



1.2 Why is it important?

Data Management covers a wide range of Topics and each of them is the subject of discussion in many organisations trying to clarify the ‘As-Is’, the ‘To-Be’ and how to get from the first to the second.



Our intention with this book is to provide a reference for the selected Topics in Data Management.



1.3 What Will I Learn?

You will learn:

  • An understanding of what each Topic involves.

  • Some Templates to use to get started

  • Which organisations and individuals provide ‘Thought Leadership’



1.4 Topics

In this book, we cover some important topics in Data Management, including :-

  • Business Intelligence (BI)

  • Checking the Quality of a Data Model

  • Data Warehouses and Data Marts

  • Knowledge Management

  • Metadata Management





Chapter 2. BI Architectures

Here is the link for LinkedIn Group on BI :-



2.1 Purpose of this Chapter

This Chapter provides a Road Map for the implementation of a BI Architecture.

2.2 Layers in the BI Architecture

This diagram shows the four Layers in the Architecture, where each Layer is supported with a set of Best Practice specifications, Data Models and Templates.

BI and Reporting Layer





Data Warehouse and Mart Layer

Operational Data Store Layer

Etc.

CRM



Data Integration Layer – Single Version of the Truth – Enterprise Data Model - ETL

Operations

Billing





2.3 Layers and Deliverables

This table shows the deliverables and artefacts at each Stage in the end-to-end BI Architecture.

Business Requirements and Design Walkthrough



BI Architecture





Deliverables - Best Practice and Templates







Report Templates

Reporting Layer







Generic Design and Models for Data Marts

Data Mart Layer







Data Warehouse





Data Transformation - Mapping to DWH in line with Enterprise Data Model



Data Integration Layer





Data Quality (Validation and Clean-up)







Data Profile





Data Staging Area

Operational Data Store Layer







2.4 The Reporting Layer

2.5.1 Best Practice

Best Practice involves identifying the User Requirements, formalising these in a document which is agreed to and signed off before development work begins

BI and Reports take data from Data Marts and many of the same considerations apply when it comes to determining Best Practice.


One difference is that is necessary to have a clearer understanding of the business operations and how the right kind of Performance Reports can provide insight to the business users.


This leads to the need for a management education process to be in place so that the evolution of Performance Reports can be planned in a logical manner, from basic summaries, to KPIs, Dashboards and so on.

2.5.2 Data Models

This Data Models shows the Data Warehouse for the FBP 3 Project. It is called a Reporting Database for consistency with the previously existing Project documentation.

2.5.3 Templates

This Template is taken from a Sales Audit Report

Sales and Refunds for April 1st. 2011

Excluding Settled Transactions













TRAN Date


Reg

Txn

Tran Type

User ID


Amount



6001760501001907996

27/06/11

50100

2

7996

Issue

hlobo

10.00


6001760501006665185

27/06/11

50100

2

5185

Issue

hlobo

50.00


6001760501006665193

27/06/11

50100

2

5193

Issue

hlobo

26.00


6001760501006665201

27/06/11

50100

2

5201

Issue

hlobo

10.00


6001760501006665219

27/06/11

50100

2

5219

Issue

hlobo

20.00


6001760501006665227

27/06/11

50100

2

5227

Issue

hlobo

20.00


6001760501007852014

27/06/11

50100

8

2014

Issue

masmith

6.00














6001760501008109703

27/06/11

50100

8

9703

Issue

masmith

5.99


6001760501008109893

27/06/11

50100

8

9893

Issue

masmith

5.99


6001760501010717113

27/06/11

50100

2

7113

Issue

hlobo

23.98


6001760501011277653

27/06/11

50301

1

2166

AdjustDec

syassir

-11.50














6001760501011566451

27/06/11

50100

5

6451

Issue

hchiwere

10.00


6001760514000081507

27/06/11

50100

2

1507

Issue

hlobo

26.00


6001760514000081515

27/06/11

50100

2

1515

Issue

hlobo

26.00


6001760514000081523

27/06/11

50100

2

1523

Issue

hlobo

41.00


6001760514000081549

27/06/11

50100

2

1549

Issue

hlobo

26.00






2.5.4 Frequently Asked Questions

What.1 – What are Performance Reports ?

    • This Stage produces and delivers Performance Reports for management

    • Report Templates supported by the appropriate Generic software are required.

What.2 - What is Business Intelligence ?

This Stage produces and delivers BI and Performance Reports to management :-

  • It must be responsive to requests for change.

  • Users requirements are always evolving

  • Therefore the approach and supporting software must be flexible

  • Report Templates supported by the appropriate Generic software are required.



This Stage delivers Performance Reports that meet the requirements of all levels of management.

  • There is a need to be responsive to requests for change.

  • User Requirements are always evolving.

  • Therefore the approach and reporting software tool must be flexible.



A sensible approach is to develop Reporting Templates supported by the appropriate Generic Software.



What.3 – How do we assess our User Report Maturity Level ?

1) Blank Template



Template Name

Date

User Category

Weekly Totals

Traffic Lights

Dashboards

KPIs

Other





2) Completed Template



Template Name

Report Maturity Level

Date

April 1st. 2010

User Category

Weekly Totals

Traffic Lights

Dashboards

KPIs

Other

Finance

Common

In use

In use

In use

Mashups

HR

Common

Aware

Unaware

Aware

Operations

Common

Unaware

Unaware

Unaware





2.2.2.2 Availability of Models and Data Marts

1) Blank Template

This Template is used to keep track of the availability of Master Data Models and Data Marts.

Template Name

Date

Category

Master Data Models

Data Marts

Customers

Finance

Products

Purchase Orders


Stores

Warehouses




2) Completed Template

Template Name

Data Model Availability

Date

March 18th. 2010

Category

Master Data Models

Data Marts

Finance

N/A

N/A

HR

N/A

N/A

Operations

N/A

N/A

Movements

N/A

N/A

NCTS

Available

N/A


Products

SEED

N/A

Customer

DTR but needs work

N/A


Warehouses

DTR but needs work

N/A


2.2.2.3 Performance Reports

1) Blank Template



Report Name

Date Produced

Product Name

Week 1 Date

Week 2 Date

Week 3 Date

Week 4 Date

Total


<Value in £’s>

<Value in £’s>

<Value in £’s>

<Value in £’s>

<Value in £’s>

Weekly Totals

<Value in £’s>

<Value in £’s>

<Value in £’s>

<Value in £’s>

<Value in £’s>

Grand Total





<Value in £’s>



2) Completed Template

These figures are fictitious.

Report Name

Value of Weekly product Movements

Date Produced

March 18th. 2010

Product Name

Dec 6th 2009

Dec 13th 2009

Dec 20th 2009

Dec 27th 2009

Total

Beer

£40,000

£60,000

£70,000

£80,000

£160,000

Cigarettes

£50,000

£60,000

£70,000

£80,000

£160,000

Cigars & cigarillos

£25,000

£30,000

£31,000

£32,000

£118,000

Leaded Petrol

  £90,000

 £91,000

£92,000

£93,000

£366,000

Unleaded Petrol

 £100,000

£120,000

£133,000

£140,000

£490,000 







Weekly Totals

£205,000

£361,000

£396,000

£425,000


Grand Total





£1,300,000

These questions are from this page :-

Why.1 - Why is this Stage important ?

The value and benefits of Reports are always a major part of the justification of the cost of designing and installing a Database.



How 1 – How do we get started ?

Here is a short Tutorial

    • Step 1. Determine if Users are ready for KPIs,Traffic Lights and Dashboards.

    • Step 2. Check availability of Master Data Models

    • Step 3. Check availability of Data Marts

    • Step 5. Check availability of Report Specifications and SQL Views for Reports

    • Step 6. Perform ‘Gap Analysis’ to identify any missing data that must be sourced.

    • Step 7. Analyse common aspects of requirements for Performance Reports



There are three Templates in this Section :-

  1. User Report Maturity Level

  2. Availability of Master Data Models and Data Marts

  3. Templates for Performance Reports



These questions are taken from this page :-





Here's another Kick-Start Tutorial :-

  • Step 1. Assess the level of Maturity of the Users concerning KPIs, Dashboards,etc..

  • Step 2. Check availability of Master Data Models and Data Marts

  • Step 3. Check availability of Report Specifications and SQL Views for Reports

  • Step 5. Tailor the Approach accordingly

  • Step 6. Aim for results in 6 months and interim results in 3 months


If you have a Question that is not addressed here, please feel free to email us your Question at barryw@databaseanswers.org.



How.2 - How do we measure progress in Business Intelligence ?

Check for :-

    • a Statement of User Requirements

    • ideally with specifications of Templates

    • Software Design Patterns.



How.3 - How do I combine Excel data in my Reports ?

Data in Excel Spreadsheets is structured in tabular format which corresponds exactly to the way in which data is stored in relational database.


Also Spreadsheets are commonly used and the data frequently needs to be integrated with other data within an organization.


Therefore we would expect to find a wide range of solutions are available to solve this problem.


Here is a small sample :-

• An ODBC connection can be established for a spreadsheet.

• Informatica allows Spreadsheets to be defined as a Data Source.

• Microsoft’s SQL Server Integrated Services also lets Excel be defined as a Source.

• Oracle provides a facility to define EXTERNAL table which can be Spreadsheets.

• Salesforce.com provides their Excel Communicator.


How.4 - How do you meet your Chief Executive’s Report requirements ?

In order to always respond to this situation appropriately, it is necessary to have an Information Catalogue, a Data Architecture and Data Lineage.


The solution then involves the following Steps :-

Step 1) Produce a draft Report for the Chief Execs approval

Step 2) Trace the lineage and perform a ‘gap analysis’ for all new data items.

Step 3) Talk to the Data Owners and establish when and how the data can be made available.

Step 4) Produce a Plan and timescale

Step 5) Review your Plan with the Chief Exec and obtain this agreement and formal sign-off.

Step 6) Deliver !!!



How.5 - How do I produce Integrated Performance Reports ?

Reports for Senior Management fall into two categories :-

      • Standard Reports

      • On-demand reports


For Standard Reports it is possible to define Templates.

For On-demand Reports, the aim is to define a flexible approach to be able to respond to changes to Requirements in a timely manner.


The key action here is to establish a unified Reporting Data Platform.

This will involve aspects previously discussed, including MDM, CMI and will certainly involve Data Lineage.

Senior Management will want to take a view of the integrated data and not focus on details of derivation.

Therefore, we have to follow the MDM approach with Data Lineage for each item in the Integrated Performance Reports.

Key Performance Indicators (‘KPIs’)

Question : What are Key Performance Indicators (‘KPIs’)

Key Performance Indicators (‘KPIs’) are in common use and represent one aspect of Best Practice.


A variation of this approach are Key Quality Indicators,(‘KQIs’) which are used to monitor and manage Data Quality.


Dashboards and Scorecards are often used in association with KPIs and KQIs.





2.5 Data Mart Layer

This diagram shows the four major Stages in delivering a ‘Single View of the Truth’ :-



2.6.1 Best Practice

An Enterprise Data Warehouse is a repository of all the data within the enterprise that is used in Performance Reports.

It is intended to guarantee a ‘Single Version of the Truth’ .

It typically stores very detailed data, such as Sales Receipts, and can maintain historical data going back many years.

A Data Mart, on the other hands, stores data for specific functional areas, such as Purchases and Sales, and the data is usually limited in timescale and might even have a limited life span.

A Data Warehouse can be either a single very large Repository of Data or it can be built as an interlocking set of Data Marts.

Each Data Mart would store data related to a separate business area, such as Sales, or for a specific Report or family of Reports.

In passing, we can mention that there are two well-known authorities in the broad field of Data Warehouses and Data Marts.

The first is Bill Inmon, who favours the first approach of large, all-encompassing Data Warehouses.

The second is Ralph Kimball, who favours related Data Marts.

Inmon and Kimball both write well and present convincing arguments for their points of view.

A sensible approach is to start with a single Data Warehouse and then to create Data Marts for specific business requirements as they occur.

In order to link Data Marts, they need to share the same values for the same Dimensions, such as Stores or Products. These are called ‘Conformed Dimensions’.

Without Conformed Dimensions, it is impossible to compare and accumulate related values in Data Marts.

2.6.2 An Agile Approach

An Agile Approach is very important because it is inevitable that user requirements will change from time to time.

We can predict three Phase in the evolution of User Requirements :-

  1. ‘Give me everything’

  2. Give me these Reports on a regular basis and give me an ad-hoc Enquiry facility.

  3. I want integrated KPIs and Dashboards

  4. I want to be notified automatically if I have any situation requiring urgent attention

In order to meet these Requirements, we need to put in place an Agile Data Warehouse with flexible Data Marts and an integrated BI Toolkit.

2.6.3 Conformed Dimensions

A Conformed Dimension is one that has the same value across all Subject Areas.

Conformed Dimensions are therefore often Master Data.

Although, of course, a Conformed Dimension is not necessarily Master Data,



The best example is probably Dates.

In this example, the Date field is a Conformed Dimension for the Purchasing and Sales Data Marts, but Suppliers and Stores are not.

Ticket Dimension

-Ticket Number

Purchasing Data Mart -Date PO Issued -Supplier ID -PO ID

Suppliers Dimension – Supplier ID



Calendar Dimension - Date

Sales Data Mart -Date of Sale -Store Number

Stores Dimension – Store Number









2.6.4 Conformance Analysis

This table conveys the levels of conformance within a Dimension by grouping the base Dimension with conformed rollup totals.

The two let-hand columns are Dimensions and the top column shows Facts.

The ‘Yes’ fields indicate the Dimensions that have to be conformed in order for the analyses to be valid.

They show that if we have Product-level data then the Product is a conformed dimension.

They are illustrative for discussion purposes.





Orders

Shipments

Inventories

Sales

Returns

Demand Forecast

Date

Day


Yes


Yes

Yes



Week

Yes


Yes



Yes


Month







Product

Product

Yes

Yes

Yes

Yes

Yes

Yes


Category

Yes

Yes




Yes

Organisation

Warehouse


Yes






Store



Yes

Yes

Yes



Division









2.6.5 Data Models

2.6.6.1 Phase 2 Data Marts

This shows how the data in the simple example Data Warehouse can be made available in two separate Data Marts – one for Gift Cards data and the other for Damaged Goods data.

2.6.6.2 Sample Data Marts

This shows the design for three representative Online Shopping Data Marts

2.6.6.3 Template for Generic Data Mart Design

This design shows two date fields because that is a common pattern.

However, there could be more or less than two.

In a similar way, it shows six Dimensions and six Facts for illustrative purposes but these, of course, could be any number of attributes.



2.6.6.3 Sample Template (in Word)



2.6.6 Frequently Asked Questions

  • What – is a Data Mart ?

    • A Repository of total and detailed data with a standard structure

    • This structure is usually a Facts Table where all the data for analysis is held, together with a number of associated Dimension Tables.

    • Generic software is used, support by common Report Templates



  • How – do we get Started ?

    • Step 1. Understand the User’s Data Requirements

    • Step 2. Determine the available Data

    • Step 3. Reconcile standards, reference data

    • Step 5. Establish a common view of the Data Platform

    • Step 6. Choose the product or use bespoke SQL

    • Step 7. Design the Templates and agree design with Users

    • Step 8. Populate the Templates with sample data

What.1 : What is a Data Mart ?

These questions are from this page :-



Data Marts are a Repository of summary, total and detailed data to meet User Requirements for Reports.



They always have a standard structure, called Dimensional Data Models, which means that it is possible to use Generic Software and adopt a common Approach based on Templates.

Describing a Data Mart is a good way to get User buy-in because they can easily be explained in a logical manner which is very user-friendly.

A Data Mart is a Repository of total and detailed data to meet User Reports requirements.



It always a standard structure which means can have generic software and a common approach based on Report Templates

A Data Mart design is simple and can be described to get User buy-in




What.2 : What are Data Mart Templates ?

Data Marts have a common design of Dimension fields and Facts.


Templates are important because they represent a tremendous Kick-Start approach to the design of Data Marts for a specific business area.

They are produced by exploiting the common design of Dimensions and Facts.


A range of Data Mart diagrams is available in the Case Studies on the Database Answers Web Site.



Why.1 : Why is this Stage important ?

It provides a single point of reference for all the data available within the organisation for producing Reports

How.1 : How do we get Started ?

These questions are form this page :-



To get started, follow these Steps :-

  • Get a broad understanding of User’s Data Requirements

  • Establish a common view of the Data Platform

  • Determine the available Data

  • Reconcile standards, reference data

  • Choose the product or use bespoke SQL

  • Use Templates and agree design with Users

  • Populate Templates with sample data

  • Get sign-off on demo specs in 1 month, aim for results for champion in 3 months and final results in 6 months.

  • Adjust timescales in light of experience



How.2: How do we measure progress with Data Marts ?

    • Check the level of Users understanding.

    • Check for existence of Templates.


How.3: How do I improve the performance of my Data Mart ?

Every DBMS produces what is called an Execution Plan for every SELECT statement.


The steps to improving the performance involve checking this Execution Plan against the Indexes that exist, and making sure that the Query Optimizer has used the appropriate Indexes to obtain the best performance.


This is a specialized area where DBA’s spend a lot of their time when they are looking after production databases where speed is a mission-critical factor.


Data Marts are always created to support Business Intelligence, which includes Performance Reports, Balanced Scorecards, Dashboards, Key Performance Indicators and so on.


Best practice always requires user involvement and a generic design to support a flexible approach to meeting changing requirements.

Users will always want changes to their first specifications of their requirements.


The insight that they obtain from the first Reports helps them identify more precisely what their long-term requirements will be.


Therefore flexibility is important.


A well-designed Data Mart will anticipate the areas where flexibility is required.

The design process should always follow two steps :-

• Production of generic design for the Data Mart

• Implementation of the design with a specific Data Mart software product.



2.6 Data Integration Layer

2.7.1 Best Practice

This Layer contains four major areas of functionality :-

* The Data Warehouse

* Enterprise Data Model

* Master Data Management / Single Version of the Truth

* ETL (Extract, Transform and Load)



2.7.2 The Canonical Enterprise Data Model

This offers a Design Pattern that can apply to any high-level System or Business Function that is providing data to be loaded into the Data Warehouse.



CUSTOMERS





EVENTS

PRODUCTS / SERVICES

ORGANISATION











SUPPLIERS

DOCUMENTS





2.7.3 Phase 1 Data Warehouse

This shows a simple example of a Data Warehouse holding data for Gift Cards and Vendor Compliance Infractions involving Damaged Goods all the data available in one single source.

2.7.4 Data Warehouse ERD

This diagram shows the Entities that contribute to the Data Landscape for the Warehouse.

2.7.4 Generic Data Warehouse

This shows the design of the Generic DWH

2.7.5 Specific Data Warehouse

This diagram shows the Generic Design above applied to Gift Card and Vendor Compliance data in a Retail organisation, (shown in red).













2.7.6 Data Model for a Single Version of the Truth

This involves establishing Master Catalogues for Customers, Products and Stores.

The Data Model for the Customer Master Catalogue is shown here and is described in more detail in a separate document.

This shows the tables involved in maintaining a Customer Master Index.







2.7.7 ETL – Template for Validation Specifications

This shows validation for the Event in the EDM

DATA ITEM

TYPE

VALIDATION

COMMENT

event_id

Integer

Unique Internal Identifier for each specific Customer Event.

For example, a Customer makes an Appointment.

Event Reference Number

Text (15)

Unique External ID for each specific Event.

Optional and cannot be validated.

For example, a Housing Repair Job Number.

Event Type

Text (15)

Reference Data.

For example, ‘Make an Appointment’.

Associate

Integer

Link to Associate Table.


Contact

Integer

Link to Contact table.


Establishment

Integer

Reference Data.

For example, a Court for Youth Offenders or a

School for Pupils.

Staff

Integer

Optional, but links to Staff Table if specified.

For example, a Social Worker.

Supplier

Integer

Optional but links to Supplier Table if specified.

For example, a Housing Contractor for repairs.

Event Address

Integer

Optional or specified Address.

For example, where an Offence took place.

Event Outcome

Text (15)

Reference Data.

For example, ‘Satisfactory’.

Event Status

Text (15)

Reference Data.

For example, ‘Cancelled’, ‘Pending’.

Event Start Date & time

Date

Mandatory, < Today and Now.

Mandatory

Event End Date & time

Date

Validation is > Start Date

Optional

2.7.7 ETL – Template for Entity Mapping


Source Table

EDM Entity

Comment

Activity

Event

For example, send a Letter or carry out an Investigation.

Aspect

Event

These are Issues

Complaint

Event

Contains repeated Options for Gender, Handling Investigator, Stages etc..

Complaint_People

Customer

Includes Complainants and Contacts, such as Edwina Currie.

Contact

Contact

People contacted with regard to Complaints.

Cost

Event_Notes

For example, Compensation to a Complainant.

PersonInv

Customer

Includes non-Customers, eg Contacts who have not made Complaints.

Letters

Documents


User

Staff


UserGroups

Team

Teams of Staff, equivalent to Teams of Social Workers.




5.1.2.9 Blank Template for Data Profiling

DATA ITEM

DESCRIPTION

MIN VALUE

MAX VALUE

MOST COMMON VALUE

COMMENTS



















5.1.2.10 Completed Template for Data Profiling

DATA ITEM

DESCRIPTION

MINVAL

MAX VALUE

MOST COMMON VALUE

COMMENTS

Withdrawn Date

Date Customer’s Approval withdrawn

Dec-31-1998

Jan-1-2010

Jun-15-2008
















5.1.2.11 Blank Templates for Data Validation

DATA ITEM

DESCRIPTION

Nullable

RULES

DATE

% QUALITY

















5.1.2.12 Completed Template for Data Validation

DATA ITEM

DESCRIPTION

Nullable

RULES

DATE

% QUALITY

Withdrawn Date

Date Customer’s Approval

is withdrawn

Yes

>Start Date



















5.1.2.13 Blank Templates for Mapping Specifications



ETL Transformations


Project Title


Known As:


Development End date:


Additional Comments:

 

 


Trigger

 

 

 


Source

(eg Table)

Data Item

Data Type

Target

(eg XML File)

Data Item

Data Type

Job Schedule

Rule Specification

 

 

 

 

 

 

 

 

 

 

 


 

 

 

 

 

 

 

 



5.1.2.14 Completed Template for Mapping Specifications

Specifications taken from migrating sample Customer data.

Mapping Specifications


Project Title: Creation of a Data Extract for Customers


Date: April 1st. 2010


Additional Comments: These Specifications are subject to review by Stakeholders.

 


Trigger

When CUSTOMERS.DAT_VAL = SYSDATE

 

 


Source

(Include DB type

and name)

Data Type

Target

Field Name

Data Type

Transf

Rule

Table

Column

 

Table

Column

 

 

 

 

 

 

 

 

 

CUSTOMERS

ID

NVARCHAR2(8)

OFFICE

Office

Unique ID

CHAR(8)

 

CUSTOMERS

DAT_VALID

DATE

CUSTOMERS

PHON_NUMBER

NVARCHAR2(35)

CUSTOMERS

FAX_NUMBER

NVARCHAR2(35)

CUSTOMERS

TELEX_NUMBER

NVARCHAR2(35)

CUSTOMERS

E_MAIL_ADDRESS

NVARCHAR2(70)

CUSTOMERS

COUNTRY_ID

NVARCHAR2(2)

OFFICE

Country

Code

CHAR(2)

Copy

As is

CUSTOMERS

TRADING_ROLE

NVARCHAR2(1)

CUSTOMERS

POST_CODE

NVARCHAR2(9)

CUSTOMERS

REG_CODE

NVARCHAR2(3)

CUSTOMERS

GEO_INF_CODE

NVARCHAR2(8)

 

 

 

 

 

 

 







2.7.3 Frequently Asked Questions

What 1 – What is Data Integration ?

Data Integration provides a one view of the truth for things of importance to the organisation, such as Customers, Products and Movements.

It includes Data Quality, Master Data Management and mapping specifications.


Here is the Web Link for the Questions on the Database Answers Web Site :-


Data Integration is concerned with combining data from various Sources into one consistent stream.


It provides an essential Single View of Data, for example, a Single View of a Customer.


It also provides a natural point at which Data Quality can be addressed.


At this Stage, Data Quality can be assessed and a Single View of a Customer can be achieved.

When Data Quality is of a uniform good quality, it can be integrated and made available as a consistent View.

This will be supported using a Glossary, as described in the Information Catalog Stage.

The current incarnation of Data Integration is Master Data Management,(MDM).

Data Integration provides a one view of the truth for things of importance to the organisation, such as Traders, Products and Movements.

It provides a natural point at which data quality can be addressed.

When Data is of uniform good quality it can be integrated and made available as a consistent View.

This leads naturally to Master Data Management,(MDM).

Details of the Integration, such as mapping specifications, are held in a Glossary, which is described in Stage 7.

Some key points :-

  • Data Integration is concerned with combining data from various Sources into one consistent stream.

  • It provides an essential Single View of Data, for example, a Single View of a Customer.

  • It also provides a natural point at which Data Quality can be addressed.

  • At this Stage, Data Quality can be assessed and a Single View of a Customer can be achieved.

  • When Data Quality is of a uniform good quality, it can be integrated and made available as a consistent View.

  • This will be supported using a Glossary, as described in the Information Catalog Stage.


The current incarnation of Data Integration is Master Data Management,(MDM).

Data Integration provides a ‘Single View of the Truth’ for the things of importance to the organisation, such as Traders, Products and Movements.



It provides a natural point at which data quality can be addressed.

When Data is of uniform good quality it can be integrated and made available as a consistent View.

This leads naturally to Master Data Management,(MDM).

Details of the Integration, such as mapping specifications, are held in a Glossary, which is described in Stage 7.



What.2 - What is Master Data Management (MDM) ?

One of the major components in Master Data Management (‘MDM’) is Customers.

MDM can be defined a ‘Providing a Single View of the Things of Importance within an organisation’


Master Data Management applies the same principles to all the ‘Things of Interest’ in an organisation.


This can typically include Employees, Products and Suppliers.

We have discussed ‘A Single View of the Customer’ and MDM involves the same kind of operations as a CMI.


That is, identification and removal of duplicates, and putting in place to eliminate duplicates in any

new data loaded into the Databases.


There is a wide choice of software vendors offering MDM products.

De-duplication and Address validation is a niche market in this area.


On the Database Answers Web Site, there is a Tutorial on Getting Started in MDM


There is a sister Web Site devoted to the topic of MDM-As-a-Service



What.3 - What are Conceptual, Logical and Physical Data Models ?

Wikipedia has some useful entries on Conceptual Models, Logical Models and Data Models.


Conceptual Data Models do not conventionally show Foreign Keys and are very useful for

making clear the Entities and Relationships in a Data Model without any Keys or Attributes.


They are very useful for discussing Requirements with Users because they show only the basics.


Logical Data Models add Foreign Keys and Attributes.

They are very useful for publishing a complete statement of the data involved.


Physical Data Models are very close to the Database design.

They are very useful for discussions between the Data Analyst, DBAs and developers.



What.4 : What does ETL stand for ?

Wikipedia has an entry on ETL which is worth a look.


ETL stands for Extract, Transform and Load.

• Extract means Extracting data from Data Sources.

• Transform covers many tasks, including –

O Selection of the data of interest

o Validation and clean-up of the selected data

o Changing the format and content of the data

o Loading into the designated Target.


In practice, there are three options for implementing ETL:-

• Develop bespoke SQL

• Use a commercial package, such as Informatica or Microsoft’s Integration Services

• Some combination of these two.

For example, developing basic SQL to clarify the Requirements and

then looking for a commercial product to meet the Requirements.


What.5 - What is Data Lineage ?

Data Lineage can be defined as the ability to the trace the derivation of all items of data that

appear in any important Performance Reports and Management Information.


That includes :-

• Who owns the original source data

• What validation and transformations are applied to the data in its life cycle



Why.1 - Why is this Stage important ?

It provides one view of the truth

It offers a point at which Data Integrity can be measured and User involvement obtained to improve Quality until it meets User standards.



How.1 - How do we get started ?

Data Profiling is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started. For example, replace LTD by LIMITED (or vice versa), and ‘&’ by AND.

The Design Approach requires Data Models for the areas of the within Scope.

It will also require Generic Data Models to support one view of the truth for major entities, such as Traders or Customers.

This one view will be implemented as Master Data Management (MDM).

  • Get a broad understanding of the data available

  • Establish a common view of the Data Platform

  • Get a broad understanding of Data Sources

  • Determine the available Data

  • Choose the MDM product

  • Determine strategy for Clouds – e.g. Reference Data available globally

    • In 1 month, produce Generic Data Models

    • In 3 months, confirm GDM with sample data and Facilitated Workshops and choose MDM product.

    • In 6 months, implement MDM and publish GDM and CMI on the Intranet.

    • Adjust timescales in light of experience



Data Integration covers a number of Steps, each of which can have its own Templates.

Examples are included here for Data Profiling and Mapping Specifications.



  • Step 1. Start with Data Profiling because it is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started.

For example, replace LTD by LIMITED (or vice versa), and ‘&’ by AND.

  • Step 2. Determine the available Data Models for major areas of the enterprise.

  • Step 3. Determine whether Generic Data Models are available to support one view of the truth for major entities, such as Customers or Offices.

  • This one view approach will be implemented as Master Data Management.

  • Step 5. Establish a common view of the Data Platform – Reference Data, Customers, products, Movements and so on.

  • Step 6. Determine the available Data

  • Step 7. Choose an MDM product or decide on in-house SQL development.





How.2 - How do we follow Best Practice

These Steps define a Tutorial of Best Practice :-

  • Step 1. Define the Target which is usually a ‘Single View Data Model’.

  • Step 2. Define the Data Sources

  • Step 3. Define the Mapping Specifications from the Sources to the Target.

  • Step 5. Define the Data Platform

  • Step 6. Identify Standards to be followed.



This Tutorial is described in detail in a separate document, entitled Data_Integration_Tutorial.doc

These questions come from this page :-

If you have a Question that is not addressed here, please feel free to email us your Question.


How.3 - How do we measure progress in Data Integration ?

Look for the existence of the following items :-

  • Generic Data Models

  • An Enterprise Data Platform

  • Identify the Data Sources

  • Selection of a MDM Product

  • Implementation of a Customer Master Index or appropriate alternative





How.4 - How do we get started ?

Data Profiling is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started.

For example, replace LTD by LIMITED (or vice versa), and ‘&’ by AND.



The Design Approach requires Data Models for the areas of the within Scope.

It will also require Generic Data Models to support one view of the truth for major entities, such as Traders or Customers.

This one view will be implemented as Master Data Management (MDM).

  • Get a broad understanding of the data available

  • Establish a common view of the Data Platform

  • Get a broad understanding of Data Sources

  • Determine the available Data

  • Choose the MDM product

  • Determine a Strategy for Clouds – e.g. Reference Data available globally

    • In 1 month, produce Generic Data Models

    • In 3 months, confirm GDM with sample data and Facilitated Workshops and choose MDM product.

    • In 6 months, implement MDM and publish GDM and CMI on the Intranet.

Adjust timescales in light of experience



Data Integration covers a number of Steps, each of which can have its own Templates.

Examples are included here for Data Profiling and Mapping Specifications.




How.5 - How do I establish a Strategy for Data Quality ?

A successful Strategy for Data Quality as an Enterprise Issue must include both organization and technical aspects.


Typical Organization aspects are :-

  • Commitment from senior management

  • Establishing the slogan “Data Quality is an Enterprise Issue” as a top-down edict.

  • Identification of the ‘Top 20’ Applications and Data Owners across the Enterprise

  • Agree sign-off procedures with Data Owners and Users


Technical aspects

  • Establish Key Quality Indicators (‘KQIs), for example Duplicate Customers records

  • Agree target Data Quality percentage

  • Define KQI Reports and dashboards

  • Develop SQL to measure KQIs

  • Define procedures to improve KQIs


How.6 - How do I handle multiple types of Databases ?

This could include Oracle, SQL Server and DB2.

The key to handling multiple types of Database is to thing of them in terms of an Integrated

Data Platforms, where all types of data are presented in a common fashion.

This then defines the logical requirement.

There is a then a number of options to physically meet this logical requirement.

The Enterprise-level option is to use an appropriate commercial product, such as Informatica



2.7 Operational Data Store Layer

2.8.1 Best Practice

Best Practice involves establishing documentation standards.

For example, each table would be preceded by MIR_ to indicate that the structure of the table ‘Mirrors’ the structure of the Data Source.

A Staging Area will need to be established.



2.8.2 Blank Template



SOURCES

CONTACT

TYPE

DATA ITEMS

COMMENTS








2.8.3 Sample Completed Template

SOURCES

CONTACT

TYPE

DATA ITEMS

COMMENTS

CRM

Joe Bloggs

TBD

Customers

‘Golden Source’

Billing

Joe Bloggs

Oracle DB

Stores

Official Source

Operations







2.8.4 Frequently Asked Questions

What.1 – What are Data Sources ?

    • A Repository for all major Applications, Databases, Spreadsheets and so on.

    • Data and information related to each Stage in the Best Practice Road Map

    • This includes details of People, Roles and Responsibilities, Applications, Databases


These questions are from this page :-


Data Sources include all major places where important data is created or used, including :-

    • Applications

    • Databases

    • Spreadsheets

    • XML files, and so on

It also includes Information related to each Stage in the Best Practice Road Map on People, Roles and Responsibilities.


This Information is stored within an Information Catalog.


A Repository record Data Sources for all major Applications, Databases, Spreadsheets and so on, data and information related to each Stage in the Best Practice Road Map


Continue reading this ebook at Smashwords.
Download this book for your ebook reader.
(Pages 1-55 show above.)