Design Tip – None or NA Instead of NULL

We are working in a world of existing applications with known technical issues couple with development and business techniques that emphasize moving quickly to fail quickly and learn from the mistake. Now that does not mean you should be selectively ignorant when beginning a new design and forget all the fundamentals. If you are the developer, then it should not be a problem to change. If you work more on the database side of things, then it can take some work to convince your developer colleagues to try something new.

The SQL Language and the underlying relation database theory does not handle NULLs well. Consequently, if you can design to avoid them, you should.

Two Fundamentals to consider

The NULL value is problematic when it comes to SET theory.

Imagine a table containing address information with a column for State or Province. The table contains 3 records with Florida as the state, 2 with NULL value as the state, and the other 45 records have various other states. 50 records in total. A query searching for all the Florida records would be rather simple and the WHERE clause looked like STATE = FLORIDA. You run the query and the 3 records are returned. Now if you want the records that are not in Florida intuitively the WHERE clause would look like STATE <> FLORIDA. The query will return 45 records omitting the 2 records with NULL value as the state. Not What you would expect. Your WHERE clause would need the extra or condition of STATE <> FLORIDA or STATE IS NULL.

The NULL values are not indexed.

Imagine a situation where the absence of a value was a criteria for retrieving and processing a record. So, something that most developers are familiar with is a ticketing system for software errors. The screen where you would login display all the records from a ticket table where the Assigned developer was equal to NULL(is NULL is the keyword) Logically this works and if you deployed this application it would appear to work normally as long as the table itself resided in the database server’s memory buffers. You would not notice until the table was very large that there was a problem and the search become slow.

Academically this is not discussed much because the slow behavior is related more to how the RDBMS implements the execution of SQL statements rather than the theory of the language. Basically it is platform specific. But if you replaced the NULL value with a string like unassigned. The problem is solved. The value unassigned will become part of the index and you can benefit from the index when the table grows in size.

Here is some of my favorite string replacements for NULL

  • NA
  • None
  • Unavailable
  • Undefined

So, are you convinced?

Let me know in the comments section what you think about limiting NULL values when possible. I know that there are NULL values very often in Date/Time that cannot be populated with a arbitrary value when the field has no value. That will be the subject of another blog post.

Have a nice day.

SQL Tip – Working With the UNION Clause

I have been working with the SQL language for very many years. I take for granted that most developers are not very experienced in using commands outside of simple select, insert, update and delete statements. Certainly, this can be a disadvantage when working. I would like to explain a few tricks when using the UNION clause.

Union basically joins the result of two separate results sets with the condition that the result sets have the same number of columns and the same or very similar data type. Naturally, the columns need to be in the same order. This seems very intuitive if you can imagine that you have two tables that are similar, and you want to produce a result set that has records from both. But you can also use records from the same table or two more complex queries to produce the result you need. Let me walk you through the details and some examples.

First the details…

There is UNION and UNION ALL

UNION joins the results of two queries UNION ALL also joins the result of two queries. UNION does not include duplicates; however, UNION ALL does include duplicates. Let us walk through the logic quickly. In the example below imagine A,B,C… as records.

Select 1 returns A,B,C


Select 2 returns C, D, E

UNION of the two -> A,B,C,D,E

UNION ALL of the two -> A,B,C,C,D,E

Higher cost of UNION vs. UNION ALL

Since the UNION clause does not include duplicates the database engine needs to sort the two result sets and eliminate the duplicates. If you are selecting records from two different tables and you know it is not logically possible to have a duplicate, you should use the UNION ALL. For example, you are querying a SALES_2020 table and SALES_2021 table and one of the columns is the date of the sale. Logically there should not be any duplicate records between the 2020 and 2021 tables. UNION also has the undesirable surprise that a GROUP BY is applied without you knowing. UNION will eliminate any duplicate not only between Select 1 and Select 2 but also UNION will eliminate any duplicates within Select 1 or Select 2.

Now some examples…

We can see in table SalesQ1 that John appears three times. Twice John purchase $100 on the same day. A normal select of the of three columns SalesDate, Customer and SalesAmount would return 5 records.


It would be intuitive to simply union the same query on the second table and have a result that combines the two sets of data together. Surprise the result will not be what you want.

SELECT SalesDate, Customer, SalesAmount
  FROM SalesQ1
SELECT SalesDate, Customer, SalesAmount
  FROM SalesQ2;
Result of UNION

John does not appear 3 times anymore in Q1’s dataset. The two records that had the same SalesDate, Customer and SalesAmount were grouped into one. Like I explained to make the query do less you need to use the UNION ALL.

SELECT SalesDate, Customer, SalesAmount
  FROM SalesQ1
SELECT SalesDate, Customer, SalesAmount
  FROM SalesQ2;
Results of UNION ALL

To illustrate how wrong things could look sometimes we can do it again without the date.

SELECT Customer, SalesAmount
  FROM SalesQ1
SELECT Customer, SalesAmount
  FROM SalesQ2;
UNION Only 4 rows???

We see that the UNION clause essentially grouped the records from the first table but also the records that have the same values in either table are grouped together. If we use the UNION ALL the result looks like all the records from both tables.

SELECT Customer, SalesAmount
  FROM SalesQ1
SELECT Customer, SalesAmount
  FROM SalesQ2;
Big difference ehh?

If the selection in each table was unique, the UNION and UNION ALL clauses would return the same results. Often things look that way in the beginning but if you are the person working with reports and ETL you often have to work with set of data that you are not an expert with. I hope this example helps with your understanding.

Have a nice day.

SQL Tip – Update Many Records in One Table With Another

I did not wait a long time for the inspiration to write this next tip. The update statement can be confusing if you are not writing SQL on a regular basis. I will explain how to update a table with the records from another table. This is different than the more common single or multiple record update with a literal value. What developers often want to do is substitute the literal value with a query on the right side of the equal sign of the “SET” clause. This does not work. If the query resolves to a single value, it will update all the records with the same value.

In a case like this, it is good to return to basics. I will give you three examples that will help you remember how to do this on your own.

Single or Multi Record Update with a Literal

Most developers are familiar with the syntax that looks something like this:


This example there is a single table (ADDRESS) and we find the exact record (ID 65) using the “where” clause and the field PROVINCE is assigned the literal value QC. Probably at some point in your programming life you created a web page or script that did something like this.

In the example above, you updated 1 row. You could have changed the “where” clause to update more than one row by choosing criteria like the POSTAL_CODE beginning with H or G or J. (Those would be Quebec Postal Codes). Still pretty easy to understand. The selection of records that meet the criteria have changed and more will be updated.

Update a Field With Another Field in the Same Row

You could look at the table and realized you actually should have not put the literal QC in the PROVINCE field but in the PROVINCE_CODE field. You can update the value of one field in a table by assigning it the value of another field in the table. The command would look like this:


That is the part you need to remember. The syntax to update every record using a value in the same table is the same syntax you used to update every or most records in one table with values from another table.

Updating a field with another field in the same row of the query

So now the part you want.

You need to create a query that combines the information in a single result. Then you assign the value in one field to another field. Let’s take an example where I have a PROVINCE_CODE field in the address table and I want to update the field PROVINCE in the same table. The lookup I will use is in another table called PROVINCE_LOOKUP that simply has two columns PROVINCE_CODE and PROVINCE. Example QC and Quebec.

This is what it looks like:


So understanding these three kinds of updates always helps me remember the last one.

Have a nice day.

SQL Tip – Finding Duplicate (or Having)

Most developers have a basic knowledge of the SQL commands “Select,” “Insert,” “Update” and “Delete.” Sometimes you want the database to return something different than was initially planned and that is when you need the SQL statements to do more. Being asked to find duplicates is a request that comes up repeatedly.

Early in my career, I was fortunate enough to receive a five-day SQL PL/SQL course at Oracle. It was a great experience and really helped me better understand how the language should be used. I remember the sprit of the lessons and the logic behind the explanations. That is what I want to share with you. I will try to always use SQL-style language to reinforce the concepts.

Now to find duplicate records you first need to identify exactly each field that you consider important in the duplication. There are often columns like the record ID or a date/time fields that are unique between two otherwise identical rows. So, analyze some records to determine what fields you will actually select to determine duplication. Fields that are always unique like a record ID are not needed.

Note: In a procedural language, you would normally identify a duplicate by sorting the information and then examining each record with the preceding record to match if it is the same. The relational database software will actually do that in the background in a variety of ways. But in SQL you do not work with this sort of step-by-step concept. You work with sets of data.

In SQL language you create a set of data by selecting the fields and applying filter criteria. The “where” clause in a select statement is probably know by every developer. The lesser know is the “having” clause. The “where” clause is applied to the set of data before any grouping and the “having” clause is applied after the grouping.

Let’s look at an example. I want to run a query that would find the first two records in the example below but not the third record.

I can see that I will need to select the Street Number, Street Name, City, Province, Postal Code. Those are the fields that are duplicated.

So, the query you write needs to look like below. The fields are selected. The same fields are grouped. The “count” function counts the number of records that are included in the grouping just made. Don’t let the 1 fool you. It is just a literal. You can specify a single field like “Street Number” or a “*” the result would be the same. The one is traditionally used for a legacy performance reason in Oracle. The “having” clause now evaluates the records that have a property in this grouping of having more than 1 record. I could have just as easily put “= 2” to get the records that only appear twice and not three times. It is really up to you.

SELECT Street Number, Street Name, City, Province, Postal Code
GROUP BY Street Number, Street Name, City, Province, Postal Code

Voila! That is all that there is to this. One record will be returned in this example.

Have a nice day.

Design tip – Enumeration vs. Lookup

I wanted to provide a simple list of good practices for database design. As I started writing, I realized that the article would be very voluminous and difficult to write at once. So, I picked a single topic that I think the developer will eventually need to think about.

In my experience, the enumerated type in a platform solves one or more problems for the developer. So as the developer notices this possibility, they simply ask me, “Which one should I use, or which is better?”

In case you need some context. Enumeration is very useful in most object-oriented platforms. A fix set of values is coded into the definition and the values are stored directly into a database column. For example, a contact may have “Home,” “Work” and “Other” as values. The object or record would then become a member of one of three possible sets. Two advantages that I understand are: At runtime, the error message is quite clear and meaningful when you assign an out of bound value. Also, some IDEs display the available choices for the enumerated type when coding thus avoiding mistakes.

So, which is better? Like most database questions the answer, it depends. I admit that from my perspective enumeration or ENUM datatypes solve some database problems too. The database column can be defined as an ENUM type. This will provide database integrity protection for the developers that come after the initial development. This is also quite helpful because the person who at a later time needs to analyze the data in the database does not need to look into the application code to find the valid values or query all the records to see what values are there. The ENUM in the column definition list them all. So I only see the benefit and would generally support ENUM type use.

A different view I’ve read in developer forums regarding enumeration and lookups is that enumerations is for something simple that never changes, example left or right. Lookups are used for everything else because of the following two reasons: database integrity and simplicity when adding a new type. From the database and long-term maintenance point of view, I disagree.

The lookup table for the purpose of referential integrity is not a valid argument. The same is achieved by the ENUM type. You also avoid an unnecessary table which I consider a nice benefit. If you are developing a platform, you might want to consider a lookup table but if you know in advance that you have a reasonable finite list you should think otherwise.

The argument that you can dynamically add a value to a table to change a type can be a good one. I would add the absolute condition that part of the development needs to have a user interface so there would be absolutely no special intervention required. If you are asking a developer to manually insert records in a type, you can just as easily modify the type in the table. So there is no real benefit.

Something to think about.

What this site will be about?

The Author

Welcome to my blog SQL Clinic. SQL Clinic is a blog that serves the application developer community with tips about the SQL language, configuring databases and database design tips. I hope, eventually, that the information that I provide is helpful when looking for solutions and insightful when thinking about new designs.

If you are a programmer that subscribes to this blog. I intend to publish design tips at least once a month. I have worked as a DBA with Oracle, MySQL, MS SQL, DB2 and PostgreSQL databases for more than 20 years. Most recently I am working with Amazon and Azure databases. My work has centred around databases, the ETL that feeds them and the platforms that use them. So reading my tips would hopefully help you.

There is no up-sell to this blog at this time. If that ever changes, I will be very clear if I am receiving any compensation for the software or service that I promote. My inspiration for this work is to diminish programmer frustration through sharing of knowledge.

Have a nice day.