Database normalisation is perhaps the most difficult concept to understand for newcomers to databases. It is the most critical process in the design of databases and is often performed by database professionals. The concept applies to all relational databases including Microsoft Access and was developed by Edgar Codd for IBM in the early 1970s.This module offers a simplified explanation to database normalisation but should be attempted as the first module in the course as it provides a basis for understanding how to successfully construct queries. Please read the topics in the specified order.

Normalisation of data
First normal form
Second normal form
Third normal form
De-normalisation

1. Introduction.

Most modern databases are of the relational database type. To obtain the best results requires an understanding of data relationships and methods of applying it and how to use many other productivity tools provided. A well-designed database must fulfil the following requirements:

· Eliminate or minimise the duplication of data, and allow data to be entered and edited quickly and easily.

· Provide fast data access time in a large database, with clear and effective reporting quickly and easily as each user requires.

· Prevent invalid data from being entered, maintain data integrity, and prevent unauthorised access to confidential information.

· Be User friendly, and allow for future expansion as new needs are identified.

The source of your data can affect the design process. As well as manually entering data into tables you have created, data may be imported directly from an existing database, spreadsheet or other type of data file. Imported tables and data become part of your database and can be structured as required. Alternatively, the database may be linked to external data files, which remain in their original format and location, but are used as if they were part of your database. You will be able to change the data within these files but will not be able to amend the structure of them.

2. Normalisation of data.

One of the purposes of using a relational database is to cut down on duplication of data. This makes it easier to enter, search for, and use the data, and takes up less disk space. A normalised database is easier to maintain and can be modified more readily as requirements change. A large database will also run significantly faster when it is normalised.

Normalisation is basically a case of splitting duplicated data into separate tables and preventing data redundancy where fields will be left blank.

For example, a Contacts table may look like this, where the CoCode field is the primary key.

CoCode

CoName

Address

Location

Contact1

Title

Phone

Contact2

Title

Phone

MCW1

McWilliams

21 Arrow St.

Glasgow

S.Purdy

Buyer

534 9214

MCW2

McWilliams

14 Clyde Cres.

London

J.Taylor

Buyer

226 8123

W.Allen

Sales

226 8190

BRWN

Brown & Son

196 Wall Rd.

Luton

A.Green

Manager

920 1444

L.Walters

Director

920 1172

MCHL

Mitchell & Co.

Old Dover Rd.

Dover

D.Phillips

Buyer

876 9339

Figure 1.

Or like this:

CoCode

CoName

Address

Location

Contact1

Title

Phone

MCW1

McWilliams

21 Arrow St.

Glasgow

S.Purdy

Buyer

534 9214

MCW2

McWilliams

14Clyde Cres.

London

J.Taylor

Buyer

226 8123

MCW2

McWilliams

14 Clyde Cres.

London

W.Allen

Sales

226 8190

BRWN

Brown & Son

196 Wall Rd.

Luton

A.Green

Manager

920 1444

BRWN

Brown & Son

196 Wall Rd.

Luton

L.Walters

Director

920 1172

MCHL

Mitchell & Co.

Old Dover Rd.

Dover

D.Phillips

Buyer

876 9339

Figure 2.

Both tables contain data duplication of the company details, and the first table would need to be expanded to accommodate the company with the most contacts, and therefore wasting space (i.e. data redundancy). The first step of normalisation is to create a separate table for duplicated fields.

3. First Normal Form.

First Normal Form splits the data into tables, which have no repeating groups or redundant data. If you split it into too many tables initially, some of them can be combined later.

CoCode

CoName

Address

Location

MCW1

McWilliams

21 Arrow St.

Glasgow

MCW2

McWilliams

14 Clyde Cres.

London

BRWN

Brown & Son

196 Wall Rd.

Luton

MCHL

Mitchell & Co.

Old Dover Rd.

Dover

Figure 3 - Companies Table.

Contact

Title

Phone

S.Purdy

Buyer

534 9214

J.Taylor

Buyer

226 8123

A.Green

Sales

226 8190

D.Phillips

Manager

920 1444

W.Allen

Director

920 172

L.Walters

Buyer

876 9339

Figure 4 – Contacts Table

The result above does not link people in the Contact table with the companies in the Companies table, so it is necessary to duplicate the Primary key from the Companies table in the Contacts table also.

CoCode

Contact

Title

Phone

MCW1

S.Purdy

Buyer

534 9214

MCW2

J.Taylor

Buyer

226 8123

MCW2

A.Green

Sales

226 8190

BRWN

D.Phillips

Manager

920 1444

BRWN

W.Allen

Director

920 172

MCHL

L.Walters

Buyer

876 9339

Figure 5 – revised Contacts table

The Primary key in the Contacts table could not be either CoCode or Contact alone because both of these fields can correctly contain duplications, but needs to be a joint key (also known as a Composite key) – CoCode and Contact combined. CoCode in the Contacts table is a Secondary (also known as a Foreign key) linking back to the Companies table.

4. Second Normal Form.

Second Normal Form occurs when all of the non-key fields are uniquely dependent on the Primary key.

In the example below the Contacts table is extended to include Correspondence details - the date and type of contact made, and the name, department and position of the person making the contact.

CoCode

Contact

Title

Phone

Date

Type

Name

Dept.

Position

BRWN

D.Phillips

Manager

920 1444

7 May

Phone

RT

Accts

Credit Controller

BRWN

W.Allen

Director

920 1172

7 May

Phone

RT

Accts

Credit Controller

MCW2

J.Taylor

Buyer

226 8123

14 May

Letter

NWB

Sales

Manager

BRWN

D.Phillips

Manager

920 1444

14 May

Phone

RT

Accts

Credit Controller

MCW1

S.Purdy

Buyer

534 9214

16 May

Phone

JBC

Sales

Salesman

MCW1

S.Purdy

Buyer

534 9214

16 May

Letter

JBC

Sales

Salesman

MCW1

S.Purdy

Buyer

534 9214

16 May

Phone

NWB

Sales

Manager

MCW1

S.Purdy

Buyer

534 9214

18 May

Phone

RT

Accts

Credit Controller

MCHL

L.Walters

Buyer

876 9339

20 May

Phone

NWB

Sales

Salesman

BRWN

D.Phillips

Manager

920 1444

20 May

Phone

RT

Accts

Credit Controller

MCHL

L.Walters

Buyer

876 9339

21 May

Phone

JBC

Sales

Salesman

MCHL

L.Walters

Buyer

876 9339

22 May

Phone

JBC

Sales

Salesman

Figure 6 – Contacts Table

The new data breaks the Second Normal Form rule as the Dept and Position fields are not dependent on the joint key of CoCode and Contact. The details of the person making contact should be held in a separate Staff table.

Name

Dept

Position

RT

Accts

Credit Controller

NWB

Sales

Manager

JBC

Sales

Salesman

Figure 7 – Staff Table

The Staff table may be able to use Name as a primary key, or an additional field can be created containing a unique code for each individual. Either field could then be used in the Contacts table as a secondary key. The Contacts table then contains two secondary key fields, one linking to the Companies table and another linking to the Staff table.

5. Third Normal Form.

This is an extension to Second Normal Form in that all of the non-key fields should also depend on the Primary key, including the whole key in a multiple key field.

It is not really practical to keep correspondence type information in the Contacts table, as the contact details will be duplicated each time contact is made. It is better to create a separate table to hold correspondence data.

CoCode

Contact

Date

Type

Name

BRWN

D.Phillips

7 May

Phone

RT

BRWN

W.Allen

7 May

Phone

RT

MCW2

J.Taylor

14 May

Letter

NWB

BRWN

D.Phillips

14 May

Phone

RT

MCW1

S.Purdy

16 May

Phone

JBC

MCW1

S.Purdy

16 May

Letter

JBC

MCW1

S.Purdy

16 May

Phone

NWB

MCW1

S.Purdy

18 May

Phone

RT

MCHL

L.Walters

20 May

Phone

NWB

BRWN

D.Phillips

20 May

Phone

RT

MCHL

L.Walters

21 May

Phone

JBC

MCHL

L.Walters

22 May

Phone

JBC

Figure 8 – Correspondence Table.

In the Correspondence table only the Date and Type fields contain new data, the other fields are Secondary keys linking back to the parent tables – Contacts and Staff. The effect of this is that we now have four related tables – Companies linking to Contacts, linking to Correspondence, linking to Staff.

6. De-Normalisation.

The effect of normalisation is to create many small tables containing just a few fields, which can in itself create unnecessary duplication of key fields. De-normalisation reverses the process until an acceptable level of duplication and redundancy is achieved.

For example, a Staff database may keep a Staff History file. The current Salary details for each member of staff should rightly be kept in the Staff History file, but when the monthly payroll is run it would have to check the Staff History file for each member, slowing the process down. It is better to duplicate the current Salary in both places. Salary changes will take slightly longer to update because it needs to be changed in both places, but this is not performed as often as the monthly payroll, so it is a worthwhile trade off for speed in calculating the payroll.


	[Home] [Database Creation] [Queries] [Maintenance] [Reports] [Normalisation] [Switchboards] [Macros]