|
Williams | Learn Data Modeling by Example - Part 1 |
112 |
First Edition: London, 2011
ISBN-13: 978-1466237414
This is Part 1 of our book has been produced in response to a number of requests from visitors to our Database Answers Web Site.
It is intended for beginners to Data Modeling
It incorporates a selection from our Library of about 950 data models that are featured on our Web site:
I hope you enjoy this Book and would be very pleased to have your comments at barryw@databaseanswers.org.
Barry Williams Principal Consultant Database Answers Ltd. London, England
This first Chapter is a tutorial on data modeling for young people. It provides an introduction to data modeling that we hope you find interesting and easy to read.
It covers the basic concepts and has a very user-friendly approach, featuring a teddy bear and kitten creating a data model on a trip as tourists to Windsor Castle, which is just outside London, England.
You can find this chapter as a tutorial on the Database Answers Web site:
In this tutorial, we will follow two young tourists as they visit Windsor Castle and create a data model.
Our
tourists are Dimple,
a 10-year-old girl, who likes sightseeing and ice cream
and Toby,
Dimple's 12-year-old brother, who likes sightseeing and designing
data models.
This is a tutorial on data modeling for young people that represents a typical data modeling project and illustrates the basic principles involved.
Data modeling is important because it is the foundation for so many activities:
It provides a vehicle for communication among a wide variety of interested parties, including management, developers, data analysts, DBAs and more.
A physical database can easily be generated from a data model using a commercial data modeling tool.
You will learn:
How to create a data model, starting from scratch.
What a typical data model looks like.
In this chapter, we will cover some basic concepts in data modeling:
Primary and Foreign Keys
One-to-Many and Many-to-Many Relationships
Hierarchies and Inheritance
Reference Data
1.3
Let’s go to Windsor
[Dimple]:
Toby, it's great being in London, which is so exciting and buzzing.


[Toby]:
I'm glad you like it, Dimple. What would you like to do today?
[Dimple]: Toby, we have seen Buckingham Palace, where the Queen
of the United Kingdom lives, and now I'd like to visit Windsor
Castle, because it's one of the most popular tourist attractions in
the UK, and it's just a short trip from London.
[Toby]: OK. Let's go...
We are starting from Buckingham Palace, where the Queen of the United Kingdom lives …
Toby and Dimple leave London and arrive at Windsor…
1.4
Arriving at
Windsor
[Dimple]
Wow, Toby, Windsor has a beautiful castle
and here is a royal
park
with lots of deer.

[Toby] Yes, Dimple, and when we look around there are so many banks, cafes, pubs, restaurants, shops, wine bars and hospitals!
The other thing that we see when we look around is people - lots of people.
So we can start thinking about our data model.

[Dimple]:
How do we get started?
[Toby]: Well, we know that we have
people and places.
The simplest start is to call all these places
establishments.
Then we simply have different kinds of establishments.
And
we have people - local people, tourists, students, people passing
through, people working here, people here on business and so on.
[Dimple]: Hmmm - so how do we translate what we know to help
us get started with our data model?
[Toby]: Let’s start a diagram with people and establishments.
This simple diagram is going to grow into a data model.
![]()
1.6
Identifiers and Primary Keys
[Dimple]:
Toby, I am one of these people so how
do I create a unique identity for myself to make me different from
everybody else?
[Toby]: We will give every person a unique
identifier
and every establishment its own unique Identifier.
When we use
these we call them Primary
Keys,
and show them in the diagram with a PK
on the left-hand side.
[Dimple]: That sounds good, Toby, but
I don't know what it means.
[Toby]: Well, Dimple, let's look
at how we use these identifiers...

Lots of people visit establishments like Starbucks in Windsor ;0)

1.7
Relationships and Foreign Keys
[Toby]:
Dimple, now we can add some interesting details because we know that
one person can visit many establishments.
We also know that one
establishment is visited by many tourists.
Then we call this a
many-to-many
relationship
between people and establishments.
To make it easier for you
to understand I have expanded the many-to-many
relationship
into two different things, which are called one-to-many
relationships.
[Dimple]: So Toby, is that like saying that one person can
make many visits to many establishments?
[Toby]: Yes, Dimple
- that's great - and we can also say that one establishment can have
visits from many people.
At this point, we can show how all
these boxes are related, and that is a very big step, because it
takes us to the idea of 'relationships'.
We can call these boxes tables - or entities if we want to speak to professional data modelers.
A table simply stores data about one particular kind of ‘Thing of Interest’.
For example, people or establishments.
Each record in a table will be identified by its own unique identifier, which we call the primary key.
It is not usually easy to find a specific item of data already in the table that will always be unique.
For example, in the United States, Social Security Numbers (SSNs) are supposed to be unique, but (for various legitimate reasons) that is not always the case.
Also, foreign visitors and tourists will not have SSNs.
Therefore, it is best practice to create a new field just for this purpose.
This will be what is called an auto-increment data type, which will be generated automatically by the Database Management System (DBMS) at run-time.
This is called a surrogate key and it does not have any other purpose.
It is simply a key that stands for something else.
It is a meaningless integer that is generated automatically by the database management software, such as Oracle or SQL Server. The values are usually consecutive integers, starting with 1,2,3,4 and so on.
Now we can see how useful our identifiers can be because we can include the person and establishment identifiers in our Visits Table.
Then
the Person_ID field becomes a link to a record for a person in the
Person Table.
This link is what is called a Foreign
Key
and we can see it's shown with 'FK'
on the left-hand side.

1.8
Products and Product Types
[Dimple]:
Toby, when we go into a shop we want to buy something.
And there
are thousands and thousands of possibilities.
How do we deal with
all that in our little data model?
[Toby]: Well Dimple, it's
really quite easy. It's like all our modeling where we look for
simple patterns that cover many situations.
[Dimple]: Hmm - I
don't know what that means. Maybe if you showed me I might understand
it.
[Toby]: OK.
Everything that we buy is called a
product,
and all we have to do is simply define the type of each product -
such as a coffee, muffin or a newspaper.
Then
we draw a little box called Products
and say that every product has a type.
In other words, there is a
relationship between the Products
and Product_Types
boxes.
The
lines are called relationships
and they are very important in data modeling.
We are now creating
an Entity-Relationship Diagram or "ERD".
This diagram shows only a line for the relationship:
![]()
The symbol at the products end is called crow’s feet and it shows the many end.
The short straight line at the Product_Types end shows the one end.
In other words, this line shows a one-to-many relationship.

Dimple, let me explain about the dotted line. It means that the relationship results in a ‘Foreign Key’ in the products table. This is shown by the ‘FK’ symbol next to the product_type_code field and it means that there is a link back to the Product_Types.
However, the primary key is only the Product_ID, and of course, this is shown by the ‘PK’ symbol next to the Product_ID field.
Later, when we talk about inheritance, we will use a straight line, in contrast to this dotted line here. This is to show that the foreign key field is also a primary key.
I have to say something a bit difficult about primary keys right now.
In the Products Table, we have to allow for a very large number of products being stored.
Therefore we use an ID field for the primary key.
We then create this ID field automatically as a number (called an auto-increment integer).
This number has no meaning and is simply used to identify each record uniquely among possibly millions or hundreds of millions.
However, things are different for ‘type’ fields.
These are what we call enumerated data and are typically reference data.
They are always relatively small in number and we choose a code for the primary key because we can create them and review them manually.
It also helps us to create a code that we can use and refer to, in contrast to the ID fields that have no meaning.
Typical examples would be:
Sizes – Small, Medium and Large where we are accustomed to seeing S,M and L.
Gender – Male and Female, where we use M and F.
This menu board at Starbucks shows lots of products.
We know that they are organized in groups, like food and drink, and each of these has more groups and so on, right down to the particular product, like caramel macchiato or a panini.
This top-down organization is called a hierarchy and appears all over the place.
Luckily we can show this very easily and neatly in our data model.

1.9
Products, Types and Product Hierarchies
[Dimple]:
Toby, when we look closely at the menu board to try to decide what to
order we can see lots of possibilities. But after a while we can see
a pattern that helps us decide.
How do we deal with all that in
our little data model?
[Toby]: Well Dimple, it's really quite
easy.
We define something called a hierarchy.
Hierarchies are very common and simply mean any situation where
there are parents, children, grandchildren and so on.
If we look
at the Starbucks menu board on the right-hand side we can see a
simple example of 'espresso' and under it a number of different
drinks.
My favorite is caramel macchiato.
So in this case,
the top-level of our hierarchy is a product category called espresso,
and the next level down is a product called caramel macchiato.
[Dimple]: OK. That sounds OK.
[Toby]: Finally, we
show this hierarchy by a dotted line in the top-right hand corner in
the entity called 'Ref_Types_of_Products'.
This is formally called a recursive or reflexive relationship and is informally called rabbit ears.

1.10
Types of People
[Dimple]:
Toby, that looks OK.
I guess we can deal with types of people the
same way, can we?
[Toby]: Yes, Dimple, and types of
establishments as well.
[Dimple]: OK, that sounds sensible.
And do they use these identifiers in a database?
[Toby]: Yes,
and what is even better is that the database will automatically
generate a new unique Identifier for you and your visits and
purchases if you want to get a refund later.
1.11
Types of People and Establishments
[Dimple]:
I see, Toby.
I guess we can deal with types of establishments the
same way, can we?
[Toby]: Yes, Dimple.
[Dimple]: OK,
that sounds sensible. And do they use these identifiers in a
database?
[Toby]: Yes, and we can use our new unique
identifier for you and your visits and purchases in case we want to
keep track of things.
Like
maybe you want to get a refund later so we need to get your details
from the database.
[Toby]:
Before we move on, let's talk about establishments.
One special
thing about Windsor is that it has a castle where the Queen lives and
a very large royal park, where she keeps deer.
But when we think
about these things, we find that we can simply fit them into our
definition of establishments.

1.12
Visits and Purchases:
Here
we can see many visitors to Windsor’s Royal Shopping Arcade.

[Dimple]:
Toby, with so many people, establishments and purchases how do they
keep track of everything?
[Toby]:
Well, Dimple, by this time, everything has its own identifier that is
used wherever they need to keep track.
[Dimple]:
OK, that sounds sensible. And do they use these identifiers in a
database?
[Toby]: Yes, Dimple, and in this diagram, we can see that we can use the unique identifiers that are shown as ‘PK,’ for Primary Keys.
We can see that we have a PK for every entity or table so we can be pretty sure we can get from any table to any other table.
This is called navigating around the data model and is a good test for a well-designed data model.

1.13
People and Inheritance
[Toby]:
Dimple, let's take a closer look at the different types of people we
can find in Windsor.
[Dimple]: OK, Toby. I hope I don't have
to think too much because I might get a headache?
[Toby]: No,
Dimple, I will do the thinking and talking and all you have to do is
nod your head when you understand.
[Dimple]: OK, Toby. I
promise to do that.
[Toby]: We already said that we have
local people and tourists.
There are always lots and lots of people visiting Windsor Castle.
When we look at this picture, we can see ceremonial guards in ceremonial red uniforms, and a big crowd, with mainly tourists but also staff in shops responsible for controlling the crowd, tourists, local people and so on.

Some
of these local
people are shoppers and some of them will be working in the shops.
We will call the workers staff
and we know different things about them than the things we know about
the tourists.
For
example, we will probably know the gender
of everybody just by looking at them.
For staff, we will usually
also know their date of birth and their home address.
In data modeling we have a very powerful approach that we call Inheritance that we can use here.
If
we want to describe this in English, we would say that staff
inherit the People_Type_Code and gender from the parent entity of
people, and in addition, they have a date of birth and home address.
For tourists, we don’t know much, except for the date of their visit, and maybe, if they buy something in a shop using a credit card, then the shop would know the credit card details.
For the ceremonial guards in red uniforms, we can tell their rank by looking at their uniform and maybe it would also tell us which unit of the army they belong to.
Does
that make sense, Dimple?
[Dimple]: I think so, Toby.
Is
it like saying that we inherit having two arms and two legs from our
parents because they have two arms and two legs, but that we have
also have things that are just us?
[Toby]: Yes, Dimple -
that's great - let's take a break and do some shopping!
[Dimple]:
I like the sound of that, Toby. Can I have an ice cream?
[Toby]: Yes, of course, Dimple – this diagram shows we are doing well.
It
show inheritance between people and the three different types of
people:

We can see a field marked as ‘PF’ in the three tables for ceremonial guards, staff and tourists.
This is unusual because it means a field that is a Primary Key in the three tables and also a Foreign Key to the People Table.
Therefore, if your first record was a ceremonial guard, then we would have a record in the People Table with a Person_ID of 1 and a record in the ceremonial guard with a Guard_ID of 1.
Similarly, if our second record was a member of staff, we would have a record in the People Table with a Person_ID of 2 and a record in the Staff Table with a Staff_ID of 3.
1.14
Staff, Establishments and Derived Fields
[Dimple]:
Toby, how do we specify that staff must work in some establishment?
[Toby]: Dimple, that's a very good question.
Fortunately,
the answer is very easy.
We add a one-to-many relationship between
the staff and establishment entities.
In English, we would say
that every member of staff must work in one establishment and every
establishment can employ many members of staff.
In
the diagram, we show this with a foreign
key
by the Establishment_ID field in the staff entity.
So if we look
closely at the staff entity, we will see ‘FK’
by the Establishment_ID field.
[Dimple]: OK, that sounds
good, and I can see how the identifiers are very important.
[Toby]:
I am glad to hear it, Dimple.
There is one more thing I have to
say.
We are learning data modeling and one important thing about
data modeling is that it has to follow a set of rules.
These rules help us to produce good data models and so they are
very important.
One of the rules is that we cannot include any
bits of data that can be derived from any other bits of data.
For
example, we usually want to know how many people work in a shop or
cafe.
Therefore we include a staff
count
field with the establishment.
But when it comes to finding the
value that goes in here, we will count the records in the Staff Table
for each establishment.
Therefore, it's a derived
field
and we call it a name that starts with 'derived_' to make things
clear.
This is because, according to the rules, we should not
include derived fields in our data model at this early stage.
I have shown it here simply as an example because it is a situation that occurs quite often so it’s good to recognize it when you see it.
Does
that sound sensible, Dimple?
[Dimple]: I suppose so, Toby. But
I've got a headache, can we go for an ice cream now?

1.15
Reference Data
[Toby]:
Dimple, you can see that I am using a Gender Table and People Types
Table.
I have given them both names that begin with 'ref_' to
make it clear that they are reference data.
This means that the
values don't change much and I can use them to define what the valid
values can be.
This is a technique that professional data
modelers use but we don't need to worry about it today.
[Dimple]:
I'm glad to hear it, Toby!
Although it isn’t difficult to
understand and it seems like a good idea.
[Toby]: In our small example, we have only four kinds of reference data altogether - gender, types of establishment, people and products.

1.16
Bringing it all Together
[Toby]:
Dimple, if we bring together everything we have talked about, we will
see that we have quite a good data model that any professional would
be proud of.
[Dimple]: OK, Toby. Do you think I will
understand it?
[Toby]: Let me help you by making a list of
the business
rules
for our model:
People can be either ceremonial guards, staff or tourists.
There are a number of establishments of different types.
Tourists can make visits to establishments and make purchases.
Staff assist the tourists when they make a purchase.
A purchase involves one product.
[Toby]:
OK, Dimple - we have a very nice data model and now we can take the
break I promised you.
[Dimple]: That's great, Toby - can I
have an ice cream?
[Toby]: Sure, but before we do I should
say something about PF,
which appears in the Staff Table.
It's unusual and it's
called PF
because it means a field which is a Primary
Key in the Staff Table and a Foreign
Key to the People Table.
[Dimple]: Hmmm, I've got a headache,
Toby - can we please go and get an ice cream?
[Toby]: OK,
Dimple. You've been a very good girl and you deserve a break.
You can admire what we have created, which is this very professional-looking data model.
1.17
Top-Level Model with Names Only
We
can show our data model at the top-level, showing only the names of
the ‘things of interest,’ which we call entities or tables if we
are thinking about a database.
This is suitable for explaining what we saw in Windsor to our family or friends.
If we wanted to describe it, we could simply say:
There are lots of people in Windsor, including ceremonial guards, staff and tourists.
There are also lots of establishments, like shops and the Castle.
Tourists made visits to establishments where they made purchases of products.

1.18
Top-Level Model with Key Fields
This
is what our data model looks like if we show key fields only and
leave out the Reference Data Tables.
This level of display is suitable if we want to confirm to each other how the tables (or entities) are related.

Finally, this is what our data model looks like if we show the key fields, all the data items only and the Reference Data Tables.
You can see that the amount of detail involved makes it more difficult to understand what’s going on and to identify what is important.
This level of display is suitable if we want to talk about details and develop a database from our data model.

1.20
Ice Cream
[Toby]:
Dimple, I've got some wonderful news for you.
[Dimple]: I'm
glad to hear it, Toby - what is it?
[Toby]: I have found your
favorite Baskin-Robbins
ice cream here in Windsor ;)
[Dimple]: Toby, are you teasing
me?
[Toby]: No, Dimple - look, there it is across the road
from Windsor Castle!
[Dimple]: Wow - that's great, so I can
have my favorite butter
pecan ice cream.

In this chapter, we have learned how to think like a data modeler and how to gradually put together a data model in our heads.
We know that if we get in the habit of doing this regularly it gets easier and more natural and soon we will be seeing the world around us as pieces of a data model that we can fit together like a jigsaw puzzle.
I hope you have found this tutorial fun and useful.
I would be very pleased to have your comments – do you like this chapter or are there any changes you would recommend?
You can email me at barryw@databaseanswers.org.

Frederiksborg Castle in Denmark

In this tutorial, we will follow two young tourists as they visit Turkey, which is a country with a tremendous history and very popular with tourists looking for something special.
Our
tourists
are Dimple,
a 10-year old girl, who likes sightseeing and ice cream
and Toby,
Dimple's 12-year-old brother, who likes sightseeing and designing
data models.
This is a tutorial on data modeling for young people that represents a typical data modeling project and illustrates the basic principles involved.
Data modeling is important because it is the foundation for so many activities:
It provides a vehicle for communication among a wide variety of interested parties, including management, developers, data analysts, DBAs and more.
A physical database can easily be generated from a data model using a commercial data modeling tool.
You will learn:
How to create a data model, starting from scratch
The important design principles involved
What a typical data model looks like
In this chapter, we will cover some basic concepts in data modeling:
Primary and Foreign Keys
One-to-Many and Many-to-Many Relationships
Hierarchies and Inheritance
Reference Data
3.3
Let’s get started
[Toby]:
We have just arrived in Turkey. What would you like to do today?
[Dimple]: Toby, It's great being in Turkey which is so exciting and has so many things to see and enjoy.

[Toby]:
I'm glad you like it, Dimple. What would you like to do today?
[Dimple]: Toby, we have come to Turkey, and I would like to see
Istanbul and visit the Blue Mosque, because it's one of the most
popular tourist attractions here, then I would like to do some
shopping, then see the sea, and I would like to finish up at
Starbucks.
[Toby]:
OK. Let's go.
We are starting from Istanbul, which is a beautiful place…

Toby and Dimple leave England and arrive in Turkey…
3.4
Arriving at Istanbul
[Dimple]
Wow, Toby, look at all these people.
[Toby] Yes, Dimple, when we look around there are so many people, shops, banks and so on!
So we can start thinking about our data model.

[Dimple]:
How do we get started?
[Toby]: Well, we know that we have
people and places.
The simplest start is to call all these places
establishments.
Then we simply have different kinds of establishments.
And
we have people
- local people, tourists, students, people passing through, people
working here, people here on business and so on.
[Dimple]:
Hmmm - so how do we translate what we know to help us get started
with our data model?
[Toby]: Let’s start a diagram with people and establishments.
This simple diagram is going to grow into a data model.
![]()
3.6
Identifiers and Primary Keys
[Dimple]:
Toby, I am one of these people so how
do I create a unique identity for myself to make me different from
everybody else?
[Toby]: We will give every person a unique
identifier
and every establishment its own unique identifier.
When
we use these we call them Primary
Keys,
and show them in the diagram with a PK
on the left-hand side.
[Dimple]: That sounds good, Toby, but
I don't know what it means.
[Toby]: Well, Dimple, let's look
at how we use these identifiers...

We have managed to find a quiet area where a very happy man is selling a Turkish favorite, called SIMIT ;0)
So, in other words, we have one person, who is the happy man, and one establishment, which is his simple stall.
So
we can create a people record with a person ID of 1 and an
establishments record for the stall, with an establishment ID of 3.

3.7
Relationships and Foreign Keys
[Toby]:
Dimple, now we can add some interesting details because we know that
one person can visit many establishments.
We also know that one
establishment is visited by many tourists.
Then we call this a
many-to-many
relationship
between people and establishments.
To make it easier for you
to understand I have expanded the many-to-many
relationship
into two different things, which are called one-to-many
relationships.
[Dimple]: So Toby, is that like saying that one person can
make many visits to many establishments?
[Toby]: Yes, Dimple
- that's great - and we can also say that one establishment can have
visits from many people.
At this point, we can show how all
these boxes are related, and that is a very big step, because it
takes us to the idea of 'relationships'.
We can call these boxes tables - or entities if we want to speak to professional data modelers.
A table simply stores data about one particular kind of ‘Thing of Interest’.
For example, people or establishments.
Each record in a table will be identified by its own unique identifier, which we call the Primary Key.
It is not usually easy to find a specific item of data already in the table that will always be unique.
For example, in the States, Social Security Numbers are supposed to be unique, but (for various legitimate reasons) that is not always the case.
Also, foreign visitors and tourists will not have SSNs.
Therefore, it is best practice to create a new field just for this purpose.
This will be what is called an auto-increment data type, which will be generated automatically by the Database Management System (DBMS) at run-time.
This is called a surrogate key and it does not have any other purpose.
It is simply a key that stands for something else.
It is a meaningless integer that is generated automatically by the database management software, such as Oracle or SQL Server, The values are usually consecutive integers, starting with 1,2,3,4 and so on.
Now we can see how useful our identifiers can be because we can include the person and establishment identifiers in our visits table.
Then
the Person_ID field becomes a link to a record for a person in the
Person Table.
This link is what is called a Foreign
Key
and we can see it's shown with 'FK'
on the left-hand side.

3.8
Products and Product Types
[Dimple]:
Toby, when we go into a shop we want to buy something.
And there
are thousands and thousands of possibilities.
How do we deal with
all that in our little data model?
[Toby]: Well Dimple, it's
really quite easy. It's like all our modeling where we look for
simple patterns that cover many situations.
[Dimple]: Hmm - I
don't know what that means. Maybe if you showed me I might understand
it.
[Toby]: OK.
Everything that we buy is called a
product, and all we have to do is simply define the type of each
product - such as a coffee, muffin or a newspaper.
Then
we draw a little box called Products
and say that every product has a type.
In other words, there is a
relationship between the Products
and Product_Types
boxes.
The
lines are called relationships
and they are very important in data modeling.
We are now creating
an Entity-Relationship
Diagram
or ’ERD’.
This diagram shows only a line for the relationship:
![]()
The symbol at the products end is called crow’s feet and it shows the many end.
The short straight line at the Product_Types end shows the one end.
In other words, this line shows a one-to-many relationship.

Dimple, let me explain about the dotted line. It means that the relationship results in a foreign key in the Products Table. This is shown by the ‘FK’ symbol next to the product_type_code field and it means that there is a link back to the Product_Types.
However, the primary key is only the Product_ID, and of course, this is shown by the ‘PK’ symbol next to the Product_ID field.
Later, when we talk about inheritance, we will use a straight line, in contrast to this dotted line here. This is to show that the foreign key field is also a primary key.
I have to say something a bit difficult about primary keys right now.
In the Products Table, we have to allow for a very large number of products being stored.
Therefore we use an ID field for the Primary key.
We then create this ID field automatically as a number (called an auto-increment integer).
This number has no meaning and is simply used to identify each record uniquely among possibly millions or hundreds of millions.
However, things are different for type fields.
These are what we call enumerated data and are typically reference data.
They are always relatively small in number and we choose a code for the primary key because we can create them and review them manually.
It also helps us to create a code that we can use and refer to, in contrast to the ID fields that have no meaning.
Typical examples would be:
Sizes – Small, Medium and Large where we are accustomed to seeing S,M and L.
Gender – Male and Female, where we use M, F and U for Unknown.
This menu board shows a typical menu in a Turkish restaurant that serves a wide range of food and drink.
We can see that they are organized in groups, like desserts and hot and cold drinks, and each of these has products, like apple baklava or turkish coffee.
This top-down organization is called a hierarchy and appears all over the place in our world.
Luckily we can show this very easily and neatly in our data model.

3.9
Products, Types and Product Hierarchies
[Dimple]:
Toby, when we look closely at the menu to try to decide what to order
we can see lots of possibilities
But after a while we can see
a pattern that helps us decide.
How do we deal with all that in
our little data model?
[Toby]: Well Dimple, it's really quite
easy.
We define something called a hierarchy.
Hierarchies are very common and simply mean any situation where
there are parents, children, grandchildren and so on.
If we look
at this menu board at the top we can see headings saying ‘Desserts’
and below ‘Hot and Cold Beverages’.
So
in this case, the top-level of our hierarchy is ‘Food
and Drink’.
‘Food’ has just one category, which is ‘Desserts’.
‘Drink’ has two categories, which are ‘Cold Drinks’ and ‘Hot Drinks’.
Then ‘Cold Drinks’ has two categories, which are ‘Apple-Tree’ and ‘Other’.
MENU

DESSERTS HOT
BEVERAGE LIST COLD
BEVERAGE LIST
Apple
Tree Juices Other
Cold Beverages

[Dimple]:
I think I understand that, it sounds OK.
[Toby]:
Finally, we show this hierarchy by a dotted line in the top-right
hand corner in the entity called 'Ref_Types_of_Products'.
This is formally called a Recursive or Reflexive relationship and is informally called rabbit ears.

3.10
Types of People
[Dimple]:
Toby, that looks OK.
I guess we can deal with types of people the
same way, can we?
[Toby]: Yes, Dimple, and types of
establishments as well.
[Dimple]: OK, that sounds sensible.
And do they use these identifiers in a database?
[Toby]: Yes,
and what is even better is that the database will automatically
generate a new unique identifier for you and your visits and
purchases if you want to get a refund later.
3.11
Types of People and Establishments
[Dimple]:
Toby, that looks OK.
I guess we can deal with types of
establishments the same way, can we?
[Toby]: Yes, Dimple.
[Dimple]: OK, that sounds sensible. And do they use these
identifiers in a database?
[Toby]: Yes, and we can use our
new unique identifier for you and your visits and purchases in case
we want to keep track of things.
Like
maybe you want to get a refund later so we need to get your details
from the database.
[Toby]:
Before we move on, let's talk about establishments.
In Turkey,
there are many different kinds of establishments, like shops, banks,
cafes, restaurants, hotels, hospitals, garages and so on.
But
when we think about these things, we find that we can simply fit them
into our definition of establishments and identify them as different
types of establishments.

3.12
Visits and Purchases:
Here
we can see two visitors taking to a stallholder in a Bazaar.

[Dimple]:
Toby, with so many tourists, stalls, shops and things to buy, how do
we keep track of everything?
[Toby]: Well, Dimple, by this
time, everything has its own identifier that is used wherever they
need to keep track.
[Dimple]: OK, that sounds sensible. And
do we use these identifiers in a database?
[Toby]: Yes, Dimple, and in this diagram, we can see that we can use the unique identifiers that are shown as ‘PK,’ for primary keys.
We can see that we have a PK for every entity or table so we can be pretty sure we can get from any table to any other table.
This is called navigating around the data model and is a good test for a well-designed data model.

3.13
People and Inheritance
[Toby]:
Dimple, let's take a closer look at the different types of people we
can find in Istanbul.
[Dimple]: OK, Toby. I hope I don't have
to think too much because I might get a headache?
[Toby]: No,
Dimple, I will do the thinking and talking and all you have to do is
nod your head when you understand.
[Dimple]: OK, Toby. I
promise to do that.
[Toby]: We already said that we have
local people and tourists.
There are always lots of people visiting the Blue Mosque.
When we look at this typical street scene, we can see shoppers, stallholders, workers and local people.

We
usually know
different things about the stallholders and workers than the things
we know about the tourists.
For example, we will probably
know the gender of everybody just by looking at them.
For
workers, we might also know things related to their employment, such
as their date of birth and their home address.
In
data modeling we have a very powerful approach that we call
Inheritance
that we can use here.
If we want to describe this in English,
we would say that staff inherit the People_Type_Code and gender from
the parent entity of people, and in addition, they have a date of
birth and home address.
For tourists, we don’t know much, except for the date of their visit, and maybe, if they buy something in a shop using a credit card, then the shop would know the credit card details.
Does
that make sense, Dimple?
[Dimple]: I think so, Toby.
Is
it like saying that we inherit having two arms and two legs from our
parents because they have two arms and two legs, but that we have
also have things that are just us?
[Toby]: Yes, Dimple -
that's great - let's take a break and do some shopping!
[Dimple]:
I like the sound of that, Toby. Can I have an ice cream?
[Toby]: Yes, of course, Dimple – this diagram shows we are doing well.
It shows inheritance between people and the two different types of people:

We can see a field marked as ‘PF’ in the tables for staff and tourists.
This is unusual because it means a field that is a Primary Key in the three tables and also a Foreign Key to the People Table.
Therefore, if your first record was a member of staff, then we would have a record in the People Table with a Person_ID of 1 and a record in the staff table with a Staff_ID of 3.
Similarly, if our second record was a tourist, we would have a record in the Person Table with a Person_ID of 2 and a record in the tourist table with a Staff_ID of 3.
3.14
Staff, Establishments and Derived Fields
[Dimple]:
Toby, how do we specify that staff must work in some establishment?
[Toby]: Dimple, that's a very good question.
Fortunately,
the answer is very easy.
We add a one-to-many relationship
between the staff and establishment entities
In English, we would
say that every member of staff must work in one establishment and
every establishment can employ many members of staff.
In the
diagram, we show this with a Foreign
Key
by the Establishment_ID field in the staff entity.
So if we look
closely at the staff entity, we will see 'FK'
by the Establishment_ID field.
[Dimple]: OK, that sounds
good, and I can see how the identifiers are very important.
[Toby]:
I am glad to hear it, Dimple.
There is one more thing I have to
say.
We are learning data modeling and one important thing about
data modeling is that it has to follow a set of rules.
These rules help us to produce good data models and so they are
very important.
One of the rules is that we cannot include any
bits of data that can be derived from any other bits of data.
For
example, we usually want to know how many people work in a shop or
cafe.
Therefore we include a staff
Count
field with the establishment.
But when it comes to finding the
value that goes in here, we will count the records in the Staff Table
for each establishment.
Therefore, it's a derived
Field
and we call it a name that starts with 'derived_' to make things
clear.
This is because, according to the rules, we should not
include derived fields in our data model at this early stage.
I have shown it here simply as an example because it is a situation that occurs quite often so it’s good to recognize it when you see it.
Does
that sound sensible, Dimple?
[Dimple]:
I suppose so, Toby. But I've got a headache, can we go to Starbucks
now?

3.15
Reservations and Generic Data Models
[Toby]:
Dimple, this bit is quite hard-going so if you want to take a rest,
that’s OK.
[Dimple]:
OK, Toby, I will just sit quietly and watch the people ;0)
[Toby]: People make reservations every day all around the world.
These reservations have a lot in common:
Hotel bookings, airline bookings, theatres and shows, appointments to see a doctor or dentist and so on.
The basic common things are a date and time, usually a specific facility, like a hotel, an airline seat, a theatre and so on.
This means that we can identify what they have in common and what they have that is different and specific to the type of appointment.
3.12.1
Reservations for a Hotel
Here
is a very beautiful and unique hotel in the caves at Yunak Evleri
(about 400 km from Istanbul), which has rooms dating back to the 5th
century.
For a hotel, of course, you would book for a specific night (or night) and maybe a non-smoking room but that is about all.

Hotel in the Caves at Yunak Evleri, Turkey
3.12.2
Reservations for Whirling Dervishes
A
very unusual spectacle that is unique to Turkey is the sight of
Whirling Dervishes dancing.
It is part of the Muslim religious practice of the disciples
For this reservation, you would book for a specific show and a seat at the price you wanted.

3.12.3
Generic Data Model for Reservations
In
this model, we define a facility to be what we are making a
reservation for.
This data model is shown on this page of our Database Answers Web site:

3.16
Reference Data
[Toby]:
Dimple, you can see that I am using a Gender Table and People Types
Table.
I have given them both names that begin with 'ref_' to
make it clear that they are reference data.
This
means that the values don't change much and I can use them to define
what the valid values can be.
This is a technique that
professional data
modelers use but we don't need to worry about it today.
[Dimple]:
I'm glad to hear it, Toby!
Although it isn’t difficult to
understand and it seems like a good idea.
[Toby]: In our small example, we have only four kinds of reference data altogether - gender, types of establishment, people and products.

3.17
Bringing it all Together
[Toby]:
Dimple, if we bring together everything we have talked about, we will
see that we have quite a good data model that any professional would
be proud of.
[Dimple]: OK, Toby. Do you think I will
understand it?
[Toby]: Let me help you by making a list of
the business
rules
for our model:
People can be either staff or tourists.
There are a number of establishments of different types.
Tourists can make visits to establishments and make purchases.
Staff assist the tourists when they make a purchase.
A purchase involves one or more products.
[Toby]:
OK, Dimple - we have a very nice data model and now we can take the
break I promised you.
[Dimple]:
That's great, Toby - can we go to Starbucks?
[Toby]: Sure,
but before we do I should say something about PF,
which appears in the Staff Table.
It's unusual and it's
called PF
because it means a field which is a Primary
Key in the Staff Table and a Foreign
Key to the People Table.
[Dimple]: Hmmm, I've got a headache,
Toby - can we please go to Starbucks?
[Toby]: OK, Dimple.
You've been a very good girl and you deserve a break.
You can admire what we have created, which is this very professional-looking data model.
3.18
Top-Level Model with Names Only
We
can show our data model at the top-level, showing only the names of
the ‘things of interest,’ which we call entities or tables if we
are thinking about a database.
This is suitable for explaining what we saw in Windsor to our family or friends.
If we wanted to describe it, we could simply say:
There are lots of people in Windsor, including ceremonial guards, staff and tourists.
There are also lots of establishments, like shops and the Castle.
Tourists made visits to establishments where they made purchases of products.

3.19
Top-Level Model with Key Fields
This
is what our data model looks like if we show Key fields only and
leave out the Reference Data Tables.
This level of display is suitable if we want to confirm to each other how the tables (or entities) are related.

3.20
Top-Level Model with all Details
Finally,
this is what our data model looks like if we show the key fields, all
the data items only and most of the Reference Data Tables.
You can see that the amount of detail involved makes it more difficult to understand what’s going on and to identify what is important.
This level of display is suitable if we want to talk about details and develop a database from our data model.

3.21
Starbucks
[Toby]:
Dimple, I've got some wonderful news for you.
[Dimple]: I'm
glad to hear it, Toby - what is it?
[Toby]: I have found
Starbucks here in Turkey, so you can have your favorite things to eat
or drink ;)
[Dimple]: Toby, are you teasing me?
[Toby]:
No, Dimple - look, there it is across the road from the Blue Mosque!
[Dimple]: Wow - that's great, so I can have my favorite
muffin.

In this chapter, we have learned how to think like a data modeler and how to gradually put together a data model in our heads.
We know that if we get in the habit of doing this regularly it gets easier and more natural and soon we will be seeing the world around us as pieces of a data model that we can fit together like a jigsaw puzzle.
This chapter discusses the basic concepts in data modeling.
It builds through a series of structured steps in the development of a data model.
This chapter covers the basic concept that provide the foundation for the data model that we designed in similar material to Chapter 1 but it is more serious and more comprehensive.
This material is also available as a tutorial for Amazon and Starbucks on the Database Answers Web site –
We will cover these basic concepts:
Creating Entities
Primary and Foreign Keys
One-to-Many and Many-to-Many Relationships
Hierarchies
Inheritance
Reference Data
At the end of this tutorial, we will have produced a data model, which is commonly referred as an Entity-Relationship Diagram, or 'ERD'.
This chapter is a description of the relational theory as originally established by Ted Codd, who, at the time, was a research scientist with IBM.
The basic concepts are important because the relational theory is very powerful and provides a sound theoretical foundation for databases that have become essential since their first appearance in the early 1970s.
They were the creation of a brilliant research scientist called Ted Codd, who was working for an IBM Research Lab at the time. It is reported that he faced internal criticism initially because it was considered that his new idea would affect sales of established IBM database products.
It is the foundation for so many activities:
It provides a vehicle for communication among a wide variety of interested parties, including management, developers, data analysts, DBAs and more.
A physical database can easily be generated from a data model using a commercial data modeling tool.
You will learn:
How to create a data model, starting from scratch.
What a typical data model looks like.
Our photo shows a typical Starbucks. If we look closely, we can see people eating, drinking and placing orders. What Starbucks sees are customers, products and orders being met.
During the course of this book we will see how data models can help to bridge this gap in perception and communication.

Getting
Started:
The
area we have chosen for this tutorial is a data model for a simple
Order
Processing System
for Starbucks.
We
have done it this way because many people are familiar with Starbucks
and it provides an application that is easy to relate to.
We think about the area we are going to model.
We can see customers ordering products (food, drinks and so on).
Our approach has three steps:
Establish the scope of the data model.
Identify the 'things of interest' that are within the scope, These will be called entities.
Determine the relationships between them.
Deciding
the Scope of Our Data Model
When we step inside, we see that Starbucks sells a wide range of products, so our first task is to decide which of them should be included in our data model.
Right now, we are interested only in something to eat and something to drink. Therefore, all the mugs and other items shown in this picture on the left, are outside the scope of our data model, and are not 'Things of Interest'.
Our first step is to decide what things are we interested in.
In other words, what is the scope of our data model?

These things will be called ‘Entities in a Data Model’ and ‘Tables in a Database’.
4.4 Creating Entities
Dezign is a data modeling tool that I use extensively because it is very good and very affordable.
You can download a free trial from this Web site:
Here is a list of modeling tools on the Database Answers Web site:
http://www.databaseanswers.org/modelling_tools.htm
This
is how you create an entity
in the Dezign data modeling tool:
1.
Right-click on a blank area in the diagram
4. From the drop-down
list, choose Insert
and Entity
3. Check the PK
box
for the primary key attribute, which will usually be the first one on
the entity.
7. Click on Close
to save the results.
4.5
Primary Keys
We
decide that the things we are interested in are customers, orders and
products.
You can buy a range of products in Starbucks, including souvenir mugs, coffee and newspapers.
For the purpose of our first model, we restrict our products to food and drink.
This diagram shows the corresponding entities with primary keys.

At this stage, we show only the entities with no relationships and minimum attributes and specify only the primary key and one details field that will be replaced later on.
The Primary Key field(s) should always be first.
You will notice that the first field in the Customers_version2 Table is the Customer_ID.
It has a PK symbol beside it, which indicates that it is the primary key for the table.
The primary key is very important and is the way that we can recognize each individual record in the table.
Creating
a primary
key in the Dezign tool:
1. Right-click on the Entity
3. Choose Attributes
6. Check the PK
box for the primary key attribute, which will usually be the first
one on the entity.
7. Click on Close
to save the results.
4.6
Foreign Keys
This
diagram shows entities with foreign keys.
Customer_ID is a foreign key that links orders to customers.

Here we have added the relationships between the entities.
When this primary key is used in another table, it is referred to as a foreign key.
We can see a good example in this diagram, where the Customer_ID appears in the Orders Table as a foreign key.
This is shown with an 'FK' symbol beside it.
Mandatory Key Fields
A foreign key is usually mandatory. For example, a value for a Customer_ID in the Customers_Payment_Methods Table must correspond to an actual value of the Customer_ID in the Customers_Version_1 Table. This is shown in the diagram by the short straight line at the end of the dotted line close to the Customers Table.
Foreign
Keys in the Dezign Tool
Foreign
keys are created automatically when you make a relationship between
two entities.
We recommend that you move the field up in the
entity so that it takes it place alphabetically among the key fields.
To
do this, right-click on the entity,
choose the Attributes
option, then click on the up or down arrow on the right-hand side.
4.7
One-to-Many Relationships

In this diagram, a customer can place zero, one or many orders.
This defines a one-to-many relationship.
This is shown by the symbol that has three small lines at that end of the relationship dotted line, which is referred to as crow's feet.
Optional Key Fields
Strictly speaking, a customer does not have to place an order. He or she could change their mind and walk out without ordering anything. In other words, we would say that the relationship is optional at the orders end. This is shown by the little O at that end of the relationship dotted line.
A data modeler would say “For every customer, there can be zero, one or many orders”.
|
TERM |
DEFINITION |
|
Customer |
Any unit that can raise a demand. |
|
Demand |
A request for assets to be supplied. The format of a request can be an electronic message, a paper form and so on. |
Business Rules: A customer can raise zero, one or many demands.
: A demand must be associated with a valid customer.
4.8 Many-to-Many Relationships
This diagram shows a many-to-many relationship between orders and products.
An order can include many products and a product can appear on many orders.
This defines a many-to-many relationship and is shown in a data model as follows:

A many-to-many relationship cannot be implemented in relational databases.
Therefore we resolve this many-to-many into two one-to-many relationships, which we show in a data model as follows:

Sometimes it is useful to see the key fields to ensure that everything looks alright.

When we look closely at this data model, we can see that the primary key is composed of the Order_ID and Product_ID fields.
This reflects the underlying logic, which states that every combination of order and product is unique.
In the database, this will define a new record.
When we see this situation in a database, we can say that this reflects a many-to-many relationship.
However, we can also show the same situation in a slightly different way, which reflects the standard design approach of using a surrogate key as the primary key and showing the demand and product IDs simply as foreign keys.
A surrogate key is simply a key that stands for something else.
We use one when it is a better design or is simply more convenient.
It is a meaningless integer that is generated automatically by the database management software, such as Oracle or SQL Server, The values are usually consecutive integers, starting with 1,2,3,4 and so on.
The benefit of this approach is that it avoids the occurrence of primary keys with too many fields if more dependent tables occur where they cascade downwards.
The benefit of the previous approach is that it avoids the possibility of orphan records in the Products in a Demand Table.
In other words, invalid records that have invalid demand ID and/or product ID values.

|
TERM |
DEFINITION |
|
Order |
A request for products to be supplied. The format of a request can be verbal, an electronic message, a paper Form, etc. |
|
Product |
An Item that can be supplied on request. It can be something small, like a muffin, or something that contains other products, like a sandwich with multiple fillings. |
Business Rules: An order can refer to zero or many products.
: A product can appear in zero, one or many orders.
: We can also say “An order can refer to many products and a product can appear in many orders”.
: In other words, there is a many-to-many relationship between orders and products.
4.9
Hierarchies and Rabbit Ears
Hierarchies
are very common and we can see them all around us.
Fortunately, we can handle them every easily in data models.

This
diagram shows how the hierarchies of products
and product types that we have just discussed are shown in our
Entity-Relationship
Diagram.
You will notice that the table called 'Product_Types' has a dotted line coming out on the right-hand side and going back in again on the top-right corner.
Data analysts call this a recursive or reflexive relationship, or informally, simply rabbit ears.
In plain English, we would say that the table is joined to itself and it means that a record in this table can be related to another record in the table. This approach is how we handle the situation where each product can be in a hierarchy and related to another Product.
For example, a product called Panini could be in a product sub-category called 'Miscellaneous Sandwiches' which could be a higher product category called 'Cold Food,’ which itself could be in a higher product super-category called simply 'Food'.
Next time you go into Starbucks, take a look at the board behind the counter and try to decide how you would design the products area of the data model.
You should pay special attention to the little 'zeros' at each end of the dotted line.
These are how we implement the fact that the Parent Product Type Code is optional, because the highest level will not have a parent.
This tutorial is also available on the Database Answers Web site:
A number of data models show examples of inheritance, including: