Database Design Considerations

While working for a previous development company, I was asked to provide a list of some high level best practices when it comes to database design. For many of the developers I worked with here (and many places in the past) SQL development is secondary to primary coding language such as .NET, C# or some other language. As a result of this and oftentimes being under pressure to produce results quickly many things slip through the cracks. Or not identified as a problem until its too late and inevitably gets added into the “technical debt” black hole.

These things were not meant to be the “end all be all” but a place to start and to get developers thinking early on.

  1. Plan your design
  • Understand your database’s purpose – Before you begin anything you have to understand what you are building and why you are building it
  • Requirement Gathering – What resources do you need? People, Infrastructure, Tools, etc. What size do you expect this to be? Is this heavily utilized or not? Is this business critical? Etc. While you may not be able to answer every question, you should have an idea.
  • Determine usage patterns – What type of database are you building (OLAP\OLTP, Normalized\De-normalized, etc.)? How do you expect that this database will be used?
  • Draw out your design – Draw out\Document your design somewhere. You may not know everything you need, but you can create a base to build from. Your design should be flexible so that you can easily add to it.
  • Get input from others – Some of the best design comes from working and discussing with others. Utilize the experience and knowledge of others
  • Consider all database options – Are there specific features or changes you want to make before getting started? For example, changing isolation levels, using compression, how to handle deletes, etc. These are things that are easy to implement in the beginning and much harder to do later. Think ahead about things you may need.
  1. Documentation
  • You should have adequate documentation that showcases your design and clearly communicates it to others. You should provide enough information that when you turn the database over to someone else to support it, they can figure out bugs and fix them.
  • By carefully naming your database objects, you can provide a lot of self-documentation on your database. A good naming convention will make it clear to anyone what your database is modeling.
  • Here are some reasons why documentation is important
    • Provides a common language between the business and IT
    • Makes it easier to locate areas of potential issues
    • Changes of making bad decisions is lower because risks are easier to assess
    • Makes maintenance\outages easier to manage and reduces risk
    • Acts as mediator between newcomers and existing staff
    • Reduces the likelihood of misunderstandings
  1. Create a naming standard and commit to it
  • There are several different naming conventions and many have their own style which is fine. However, the most important thing is to be consistent in how you name things and keep things readable. A good naming convention will make a database very readable for anyone who supports the system after you.
  • The following are some guidelines when creating your naming convention
    • Use unique descriptive names that make sense
    • Makes your different objects have distinguishable names. You should be able to tell the difference between a proc, a table, a view, a function, etc. based on the name alone.
    • Avoid using just “ID” as the PK of each table. This makes the database harder to read and requires aliasing when reading multiple ID columns from multiple tables
    • Beware of using SQL Server reserved words in object names (ie. Date, User, Case, etc.). This can create syntax issues if brackets are not used.
    • Avoid using hyphens, spaces, quotes, etc. in object names
  • The following is an example of naming conventions for a few objects
    • Primary Key – PK_
    • Stored Proc – sp_
    • View – vw_
    • Function – fn_
    • Index – ix_
    • Default Constraint – DF_
  1. Testing
  • Test everything and be prepared to make changes.
  • If someone disagrees with you on a design decision, be prepared to generate tests that show the results. What someone thinks or is told is not adequate means to change a design. You should be able to back up any decision you make
  1. Carefully consider what datatypes you use
  • You should always to try to use the smallest datatype you can while taking into account future growth. Choosing the correct datatype can have a large impact on storage and performance. Take time to carefully consider what you use and determine if it is the best use for the scenario.
  • Do not sell yourself too short either, you want to avoid needing to change datatypes later.
  • Understand Unicode vs Non-Unicode and when you should implement one over the other and make sure you are consistent.
  1. Understand normalization and how to implement it
  • I think the level of normalization depends on the project, but it is important to understand the benefits and disadvantages of it during your design.
  • Benefits of normalization
    • Smaller database. By eliminating duplicate data, you will be able to reduce the overall size of the database
    • Better performance – By having more narrow tables, it allows you to have less columns and fit more records per page. Fewer indexes per table means faster writes and maintenance, and you only need to join tables that you need.
  • Disadvantages of normalization
    • More tables to join
    • Tables contains codes instead of real data so you have to go back to the lookup tables for values
    • Difficult to query against. It is optimized for applications, not ad hoc querying.
  1. Use a single column as your Primary Keys
  • When you are designing a new database you should try to design things so everything connects via single integer columns and use foreign key relations from other tables. SQL Server processes number datatypes faster than character datatypes so this increases performance. They also reduce the size of data that is being stored on both the table itself and on any additional indexes. The PK columns are hidden columns in all indexes created.
  • Your primary key will be referenced in other tables so you want it as small and as fast as possible. Composite business keys force you to duplicate data in foreign key relationships and causes more data to be read on joins.
  • Avoid using composite business keys as your primary key. If you need uniqueness, then apply a non-clustered unique index
  • If you do not have a natural primary key to use, then create an identity and let SQL Server assign the number
  • For existing development, then it may not be possible or make sense to do this so instead you may need to work within the confines of the existing database structure
  1. Most everything should have a PK/Clustered Index defined
  • You should have some way to uniquely identify a row in each table
  • The only exceptions to this could be staging tables or ETL tables
  • A primary key is important for SQL server, not only to allow for processes such as replication, Change Data Capture, filetables, audit trail, and data synchronization, but also to simplify the creation of foreign key constraints.
  • Fragmentation that occurs from updates, deletes, inserts
  • Elimination of uniquifiers when performing a table lookup
  1. Use SQL Server built-in functionality to protect data integrity
    1. Nullability – NULL is used explicitly to indicate when a value is unknown, undefined, does not, or cannot exist. If you use NULL values make sure that your code and design can easily support this.
    2. Constraints – (Including NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT, INDEX). These are used to specify rules for the data in a table by limiting the type of data that can go into a table. This ensures the accuracy and reliability of the data stored.
    3. String length – Think about your string length before assigning a value. Plan each column carefully and define size accordingly. Note that anything over 4000 characters will be store as Lob_data and can only contain full-text indexes.
  2. Used stored procedures for SQL code
  • All SQL code should be stored within stored procedures
    • Maintainability – Stored procedures provide a known interface to the data. When code that accesses the database is compiled in a different layer, tweaks cannot be made without a programmer’s involvement. This makes things easier to troubleshoot.
    • Encapsulation – Logic can be stored within a single place and changes without impact to the application
    • Security – More tightly security can be implemented. Specific and granular access can be granted
    • Performance – Plans get re-used more often with stored procs. Additionally, code stored here is much easier to performance tune.
  1. Proper Indexing
  • If you are reading from a table, it needs to have a clustered index
  • Create non-clustered indexes on the columns that are frequently used in predicates and join conditions. These are typically your foreign keys.
  • Understand the characteristics of the database and its most frequently used queries so you can index appropriately so you can create covering indexes for your more critical queries.
  • Determine the optimal storage location for an index. For example, it may benefit from being on a separate filegroup
  • Avoid over-indexing heavily updated tables and keep indexes narrow (as few columns as possible).
  • Consider the order of your columns in your index
  1. Keep it simple
  • While there are certainly times that complicated coding is required, try to keep things as simple as you can. Simple designs are much easier to manage
  1. Consider using flags instead of DELETE
  • Strongly consider whether you should be deleting data from this database. You can implement Delete flags to indicate when a row is no longer active. If it is needed it can always be changed back. It is better to do this from the beginning so that the flag is written into the code to check.
  • The Delete flag should be stored in parent tables

 

Author: Dustin Dorsey

Dustin is great!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: