Nogiax
Nogiax

Reputation: 45

Table structure for data with many NULLs

I'm currently trying to model a dynamic data object that can have or miss some properties (the property names are known for the current requirement). It is not known if new properties will be added later on (but it is almost certain). The modeled object is something along the line of this:

int id PRIMARY KEY NOT NULL;
int owner FOREIGN KEY NOT NULL;
Date date NOT NULL;
Time time NOT NULL;
Map<String,String> properties;

A property can be of any type ( int, bool, string,... )

I'm not sure how i should model this object in an SQL database. There are 2 ways i can think of to do this and i would like to have some input which will be the better choice in terms of developer "work"(maintenance), memory consumption and performance. As a side info: properties are almost always NULL (not existant)

(1) I would have a big table that has id, owner, date, time and every property as a column whereas missing properties for a row are modeled as NULL. e.g.

TABLE_X
id|owner|date|time|prop_1|prop_2|prop_3|...

This table would have alot of NULL values.

If new properties should be added then i would do an ALTER TABLE and insert a new column for every new property

Here i would do a "usual"

SELECT * FROM TABLE_X ...

(2) I would have a main table with all NOT NULL data:

TABLE_X
id|owner|date|time

And then have a seperate table for every property, like this:

TABLE_X_PROP_N
foreign_key(TABLE_X(id))|value

Here would be no NULL values at all. A property either has a value and is in its corresponding table or it is NULL and then does not appear in its table.

To add new properties i would just add another table.

Here is would do a

SELECT * FROM TABLE_X LEFT JOIN TABLE_X_PROP_1 ON ... LEFT JOIN TABLE_X_PROP_2 ON ...

To repeat the question (so you don't have to scroll up): Which of boths ways to deal with the problem is the better in terms of maintenance (work for developer), memory consumption (on disk) and performance (more queries per second)? Maybe you also have a better idea on how to deal with this. Thanks in advance

Upvotes: 0

Views: 419

Answers (3)

jleach
jleach

Reputation: 7792

Sounds like you're trying to implement an Entity-Attribute-Value (often-viewed-as-an-anti-)pattern here. Are you familiar with them? Here's a few references:

https://softwareengineering.stackexchange.com/questions/93124/eav-is-it-really-bad-in-all-scenarios

http://www.dbforums.com/showthread.php?1619660-OTLT-EAV-design-why-do-people-hate-it

https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model

Personally I'm extremely wary of this type of setup in a RDBMS. I tend to think that NoSQL document style databases would be a better fit for these types of dynamic structures, though admittedly I have relatively little real-world experience with NoSQL myself.

Upvotes: 1

Daniel
Daniel

Reputation: 2804

If you go with Option 2, I would think you need 3 tables:

TABLE_HEADER
id|owner|date|time

TABLE_PROPERTY
id|name

TABLE_PROPERTYVALUE
id|headerID(FK)|propertyID(FK)|value

Easy to add new properties allow you greater flexibility and to iterate much faster. The number of properties would also have an effect (for example if you have 500 properties you aren't going to want a table with 500 columns!). The main downside is it will become ugly if you need to attach complex business logic using the properties as its a more complex structure to navigate and you can't enforce data integrity like not null for particular fields. If you truly want a property bag like you have modeled in your object structure then this maps easily. Like everything it depends on your circumstances for what is most suitable.

Upvotes: 1

arturro
arturro

Reputation: 1606

Solution 2. but why without separate tables for every property. Just put everything in one table:

properties(
foreign_key(TABLE_X(id))
property_name,
value);

Upvotes: 1

Related Questions