Tuesday, February 12, 2019

Blog 5 - Relational Database Theory

Databases are so important to modern technology and business and are used in many places that are no immediately apparent. A lot of those databases either still are, or began as relational databases. A relational database is one where all the data is based on the relational model rather than a hierarchical model. What this means is that the data is organized into tuples and grouped into relations. A tuple is just another word for a row of a table. This kind of data model was first described by Edgar F. Codd in 1969. The language created to query databases of this kind was SQL (Structured Query Language). The point of SQL was that you would be able to retrieve "all the data with a single query." (1) The benefit of this is since the database is made up of relations, you can use one query to follow those relations and grab the data that you are looking for. A NoSQL database would require several queries to do the same thing.

The design of these databases revolves around creating these relations so that you can easily query your database afterwards. This is because "A database consisting of independent and unrelated tables serves little purpose (you may consider to use a spreadsheet instead). (2) Without relations, a database would be largely unnecessary. There are 3 types of relations that can be modeled in a relational database: one-to-one, one-to-many, and many-to-many. An example of a one-to-one relationship would be having a table of products then having a table that had supplementary information about each product. You would have a row in the product table that would correspond to one row in the product info table. An example of a one-to-many relationship would be a teacher and their students. In a classroom, there is only one teacher but there are many students. An example of a many-to-many relationship would be students to classes. Each student has many classes and each specific class has many students.

A big tenet of relational database design lies in database normalization. The three reasons that you would care about database normalization are that "to minimize duplicate data, the second is to minimize or avoid data modification issues, and the third is to simplify queries." (3) This is very important in the design for many reasons but one in particular is that if you build your database at a certain size but later down the road need to expand it for some reason, if the database is not normalized then it will a very time consuming, laborious process to ensure that you are not losing any data in the transition. To normalize a database, you need to ensure that it fits a series of normal forms. An unnormalized database would be in the UNF form. These normal forms help to ensure that you do not have the errors listed above. It starts with First Normal Form (1NF) and makes its way up to Sixth Normal Form 6NF) with some others along the way. As the normal forms get higher in number, the more stringent their constraints there are. Most databases shoot for Third Normal Form (3NF) as that is sufficient for most applications. Some databases do require higher normal forms however. These normal forms were described by Edgar F. Codd as well.

References

[1] How NoSQL, Relational Databases Do (And Don't) Help Companies Meet GDPR Requirements. ItProToday. <https://www.itprotoday.com/no-sql/how-nosql-relational-databases-do-and-dont-help-companies-meet-gdpr-requirements>
[2] A Quick-Start Tutorial on Relational Database Design. yet another insignificant... programming notes. <https://www.ntu.edu.sg/home/ehchua/programming/sql/relational_database_design.html>
[3] Database Normalization Explained in Simple English. essentialSQL. <https://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/>

Comments

To Albert Hanan
Albert, your bit about Amazon Aurora is very interesting. I am familiar with databases and the claims that Amazon is making are very intriguing as to if they are legitimate.

To Nick Maloney
Nick, your comment about the use of SQL in BIM is very interesting. I know how to use SQL and having that knowledge I can see so many uses for SQL within BIM and the AEC industry. I feel that if the AEC industry were to embrace SQL and databases more fully, we would be much more efficient.

To Yidi Li
Yidi, I think that construction firms could benefit greatly from relational databases. I think that that kind of integration would lead to much more efficient firms and less logistical problems.

1 comment:

  1. Richard, learning about normalization of tables in the relational database theory was a great learning experience that I did not come across in my research. Getting that in depth into this research has clearly provided you an extra level of understanding on the subject matter. If most databases aim to reach 3NF, you mentioned that some databases require a higher level of normalization. Are there any that require to reach 6NF, as the maximum level? I would be interested to learn which databases require this, and what their primary uses are. Also, I am curious about the same topic, but for 1NF. Why would a database strive for the most basic level, when the norm is 3NF?

    ReplyDelete

Note: Only a member of this blog may post a comment.