derekv
derekv

Reputation: 3141

When would combining columns into a single, delimited column be better in a RDB schema?

Consider for example the case where you have two peaces of data, where one value is rarely used without the other. As one example, here is a table holding user authentication data :

CREATE TABLE users
(
id INT PRIMARY KEY, 
auth_name STRING,
auth_password STRING,
auth_password_salt STRING
)

I think that password is meaningless without salt, and the other way around. I also have the option on representing the data this way:

CREATE TABLE users
(
id INT PRIMARY KEY, 
auth_name STRING,
auth_secret STRING,
)

And in auth_secret, store strings such as D5SDfsuuAedW:unguessable42

In general, are there any situations where combining columns into one, delimited column would be a better choice?

Even if it is never a "better choice" overall, are there any costs (performance, space, anything) to having more columns vs fewer columns (for the same data)? My motivation is better understanding and to be able to more competently argue against it when someone suggests this sort of thing.


--edited I changed the example... original example as follows:

CREATE TABLE points
(
id INT PRIMARY KEY, 
x_coordinate INT,
y_coordinate INT,
z_coordinate INT
)

vs

CREATE TABLE points
(
id INT PRIMARY KEY,
position STRING
)

In position, storing strings such as 7:3:15

Upvotes: 2

Views: 109

Answers (3)

Ben
Ben

Reputation: 52863

The only possible answer to this question is never. Never, ever, store delimited data in a column. It defeats the entire point of columns, which are there to delimit your data, and makes it inordinately difficult to do anything that a database has been designed to do. It's a violation of normalisation so huge that you'll spend hours on Stack Overflow trying to correct it in a months time.

Never do this.

However, "never say never".

In certain, extremely limited, circumstances it's okay. Never assume it's okay but it can be.

A good example is Stack Overflow's own Posts table, which stores the tags in a delimited format for quick reading. The tags a question has are read from the database far more often than they are edited. The tags are stored in a separate table, PostTags, and then denormalised to Posts when they are updated.

In short, even though you can denormalise your data in this way, don't. Try everything possible to avoid it. If you come across a situation where you've been optimizing for days and the only way to get something quicker is to denormalize, then it's okay. Just ensure that you are only ever going to read data from that column and you have a secondary process in place to ensure that it is kept up-to-date. If the update of the denormalised data fails, roll everything back to ensure that your data is consistent.

Upvotes: 2

You left out a significant option: create an appropriate user-defined data type. (PostgreSQL has long had an intrinsic data type for 2-space.)

These implementations differ quite a lot.

But you might not have the luxury of using one of those platforms. You might have to use MySQL, for example, which doesn't support user-defined data types.

Relational theory says that data types can be arbitrarily complex; they can have internal structure. The most common data type that has internal structure is the type "date". Relational theory specifies what the dbms is supposed to do with data types like that. The dbms must either

  • ignore the internal structure entirely, or
  • provide functions to manipulate the parts.

In the case of dates, every SQL dbms provides functions to manipulate the parts.

You can make a good argument for a single column that stores 3-space coordinates like "7:3:15" in MySQL. To keep in line with relational theory, you'd want the dbms to ignore the structure, and return only the single value "7:3:15"; manipulation of parts is left to application code.

One problem with implementing something like that in MySQL is that MySQL doesn't enforce CHECK constraints. So it's a lot harder to prevent values like "wibble:frog:foo" from finding their way into the database.

Upvotes: 1

Oded
Oded

Reputation: 499062

You do that when there is no chance of needing to join, query, report or aggregate the data.

In other words - never. It is bad database design.

First Normal form (NF1) states that attributes should be distinct - it is the basic requirement.

Upvotes: 3

Related Questions