Information System DB Design
Table of Contents
Requirements analysis
Although the focus of this article is DB design here are some things to consider from the requirements:
- Purpose of the information system.
- Features based on needs, never on assumptions.
- “must be able to…”
- Go from open to close questions.
- Describe user roles, and privileges.
Data Modeling
Identify
- required data
- draft/review mandatory input forms, and reports
- problem domain concepts
Model
- organize concepts as sets of attributes
- an instance of a concept is
- a tuple of attribute values
- visualized as a row.
- fields collected according to a tuple of attributes
- make a relation
- are generally visualized as tables
- are shown as columns, for single attributes
Table design
A table may stand for
- a concept with definite attributes. eg. name -> 1st, middle, last
- an attribute with indefinite descriptors. eg. phone -> work, mobile, etc
Also, consider
- extracting redundant attributes into a new table
- using associations for composite concepts
Test designs
From the first schema implementation we should test our table, and associations designs using sample/fake data.
- check all operations work as expected
- define associations
- improve ease of operation
We should iterate over table, and association designs to avoid modifying their attributes once in production.
Relevant Sequel pages:
Also, since test data hardly ever produces the best optimizations for real data we should avoid creating indexes, and the like, during testing.
Associations
Like DB tables, associations also stand for concepts in the problem domain. They are used to simplify the code that deals with related tuples in separate DB tables.
We can only associate DB tables which share key attributes.
Keys
We enforce uniqueness by assigning an attribute as the key. Whenever we need to use more than one attribute to differentiate each tuple we call it a composite key.
Primary key
A table’s unique identifier is called primary key. To assign a primary key to a table we need to identify (in order)
- super keys. all combination of attributes that uniquely id a record.
- candidate keys. super keys which are
- irreducible
- always unique
- never NULL.
- immutable.
- primary key. The candidate key with least attributes.
- should not expose sensitive data.
- prefer single attribute keys over composite ones.
- artificial/surrogate key. unique auto-generated random id.
- natural/business key. From among existing attributes.
Foreign key
On every association, one of the tables has to hold the primary key of the other. Although is meant to connect tuples in different tables it’s called foreign key.
To avoid orphaned tuples across tables, whenever we delete a tuple from any of the tables in an association we must either:
- Disallow changes.
- Update the tables in the relationship.
- Hide, and flag connected tuples as deleted
- Delete connected tuples
Many-to-One
Two tables are in a many-to-one (*-1
) association when zero, one, or more tuples in one table are associated to a single tuple in a different table.
| albums | | artists |
|------------| |---------|
| :id | /---1- | :id |
| :artist_id | -*---/ | :name |
| :name |
Using Ruby’s Sequel toolkit
class Album < Sequel::Model
many_to_one :artist
end
One-to-Many
Two tables are in a one-to-many (1-*
) association when every tuple in one table is associated with zero, one, or more tuples in another table.
| artists | | albums |
|---------| |------------|
| :id | -1---\ | :id |
| :name | \--*- | :artist_id |
| :name |
This association is complimentary to many-to-one. Using Sequel,
class Artist < Sequel::Model
one_to_many :albums
end
One-to-One
A one-to-one (1—1
) association may be necessary when we have to split a table into two or more tables for security or performance reasons.
| users | | phrases |
|---------| |----------|
| :id | -1---\ | :phrase |
| :email | \--1- | :user_id |
This is considered a subset of the one-to-many association.
class Artist < Sequel::Model
one_to_one :album
end
Many to Many
A many-to-many association (*-*
) allows each tuple in a table to be associated to many tuples in an associated table, and vice versa.
| albums | | artists |
|------------| |---------|
| :id | /---*- | :id |
| :artist_id | -*---/ | :name |
| :name |
To satisfy both conditions, this association is broken into two one-to-many associations using a joined table to associate them.
| albums | | albums_artists | | artists |
|--------| |----------------| |---------|
| :id | -1-\ | :album_id | /-1- | :id |
| :name | \-*- | :artist_id | -*-/ | :name |
Using Sequel, and assuming the albums_artist
table exists
class Artist < Sequel::Model
many_to_many :albums
end
One-through-one
The one-through-one association sets up a one-to-one association through a joined table.
The difference between the one-through-one, and many-to-many association is that in the former there can only be 0 or 1 records in the associated table.
This association is useful for
- enforcing unique references, in the join table, to one of the tables.
- ordering the association, and only return the first record.
Using Sequel
class Artist < Sequel::Model
one_through_one :album
end
Normalization forms
Normalization is a way to organize data. There are various normal forms, each dependent on the previous one.
1st Form
- every attribute field can only hold a single a value
- eliminate multiple values per field
- assign a primary key
- combination of fields/attributes that uniquely id a tuple/record
- extract the attributes whose field values cause inner tables
It’s usually easier to apply the 1st form in this order
- eliminate multi-values
- eliminate repeating groups
- assign primary key
2nd normal form
All attributes must be related to the primary key. Extract any attribute which doesn’t refer to the primary key, into it’s own table.
3rd normal form
Remove transitive dependencies:
- Ensure none of the non-PK attributes depends on a non-PK attribute.
- Remove attribute values based on other attributes. Instead, generate them dynamically.
More normalization
There are other four normal forms. Beware, the more we normalize, the more we decompose our data into smaller tables. Which could make it difficult to write queries due to multiple Join
.
- Boyce-Codd
- extract all non-key attributes that don’t depend on each candidate key
- Symmetric Constraint (4NF)
- ensure all attributes of equal importance are part of the CK
- Projection-Join (5NF)
- Break the CK into associations
- Irreducible Relations (6NF)
- Break relations into the smallest set of attributes
- Mostly used for interval attributes
Although we make reference to a composite key in forms 4 and 5, at that point those should be the only attributes in the relation.
Data Integrity
Some of the techniques we can use for data integrity are:
- Define each attribute’s set of permitted values
- Constraint attributes types
- Trigger a chain of mandatory procedures in response to an event.
Sequel relevant documentation:
Transactions
A transaction is the execution of several operations considered a unit of work. A transaction is consider reliable when it has these properties
- Atomicity. All operations must succeed or nothing changes.
- Consistency. Data moves from one valid state to another.
- Isolation. DB state after executing a transaction concurrently or sequentially is the same.
- Durability. Changes should be persistent.
Sequel relevant links:
Advanced features
Once the system has been in production for a while we should analyse DB statistics to improve the DB performance. Consider using features such as
- index
- functions
- views
- prepared statements
- windows
Keep in mind that some DBMS, such as postgres, provide details of their implementations so we can use the best one according to our needs.
Sequel relevant links