Johan Fredrik Varen
Johan Fredrik Varen

Reputation: 3914

Schema to support dynamic properties

I'm working on an editor that enables its users to create "object" definitions in real-time. A definition can contain zero or more properties. A property has a name a type. Once a definition is created, a user can create an object of that definition and set the property values of that object.

So by the click of a mouse-button, the user should ie. be able to create a new definition called "Bicycle", and add the property "Size" of type "Numeric". Then another property called "Name" of type "Text", and then another property called "Price" of type "Numeric". Once that is done, the user should be able to create a couple of "Bicycle" objects and fill in the "Name" and "Price" property values of each bike.

Now, I've seen this feature in several software products, so it must be a well-known concept. My problem started when I sat down and tried to come up with a DB schema to support this data structure, because I want the property values to be stored using the appropriate column types. Ie. a numeric property value is stored as, say, an INT in the database, and a textual property value is stored as VARCHAR.

First, I need a table that will hold all my object definitions:

Table obj_defs

id | name      |
----------------
 1 | "Bicycle" |
 2 | "Book"    |

Then I need a table for holding what sort of properties each object definition should have:

Table prop_defs

id | obj_def_id | name      | type |
------------------------------------
 1 |          1 | "Size"    |    ? |
 2 |          1 | "Name"    |    ? |
 3 |          1 | "Price"   |    ? |
 4 |          2 | "Title"   |    ? |
 5 |          2 | "Author"  |    ? |
 6 |          2 | "ISBN"    |    ? |

I would also need a table that holds each object:

Table objects

id | created    | updated    |
------------------------------
 1 | 2011-05-14 | 2011-06-15 |
 2 | 2011-05-14 | 2011-06-15 |
 3 | 2011-05-14 | 2011-06-15 |

Finally, I need a table that will hold the actual property values of each object, and one solution is for this table to have one column for each possible value type, such as this:

Table prop_vals

id | prop_def_id | object_id | numeric | textual | boolean |
------------------------------------------------------------
 1 |           1 |         1 |      27 |         |         |
 2 |           2 |         1 |         |  "Trek" |         |
 3 |           3 |         1 |    1249 |         |         |
 4 |           1 |         2 |      26 |         |         |
 5 |           2 |         2 |         |    "GT" |         |
 6 |           3 |         2 |     159 |         |         |
 7 |           4 |         3 |         |    "It" |         |
 8 |           5 |         3 |         |  "King" |         |
 9 |           6 |         4 |       9 |         |         |

If I implemented this schema, what would the "type" column of the prop_defs table hold? Integers that each map to a column name, varchars that simply hold the column name? Any other possibilities? Would a stored procedure help me out here in some way? And what would the SQL for fetching the "name" property of object 2 look like?

Upvotes: 18

Views: 10160

Answers (2)

Lord Tydus
Lord Tydus

Reputation: 544

You must accept that relational databases are not good at providing this kind of functionality. They CAN provide it, but are not good at it. (I hope I'm wrong). Relational databases lend themselves better to defined interfaces, not changing interfaces.

--EAV tables give dynamic fields but suck on performance. Sucks on indexing. And it is complex to query. It gets the job done in many situations, but can fall apart on big tables with lots of users hitting the system.

--"Regular" tables with several place holder columns are OK for performance, but you get non-descriptive column names and are limited in the number of columns you can "add". Also it does not support sub-type separation.

--Typically you create/modify tables at development time, not run time. Should we really discriminate against modifying the database at run time? maybe, maybe not. Creating new tables, foreign keys, and columns at run-time can achieve true dynamic objects, while giving the performance benefits of "regular" tables. But you would have to query the schema of the database, then dynamically generate all of your queries. That would suck. It would totally break the concept of tables as an interface.

Upvotes: 6

Ken Downs
Ken Downs

Reputation: 4827

You are implementing something called Entity-Attribute-Value model http://en.wikipedia.org/wiki/Entity-attribute-value_model.

Lots of folks will say it's a bad idea (usually I am one of those) because the answer to your last question, "What would the SQL for fetching..." tends to be "thick hairy and nasty, and gettting worse."

These criticisms tend to hold once you allow users to start nesting objects inside of other objects, if you do not allow that, the situation will remain manageable.

For your first question, "what would the "type" column of the prop_defs table hold", everything will be simpler if you have a table of types and descriptions that holds {"numeric","Any Number"}, {"textual","String"}, etc. The first value is the primary key. Then in prop_defs your column "type" is a foreign key to that table and holds values "numeric", "textual", etc. Some will tell you incorrectly to always use integer keys because they JOIN faster, but if you use the values "numeric", "textual" etc. you don't have to JOIN and the fastest JOIN is the one you don't do.

The query to grab a single value will have a CASE statement:

SELECT case when pd.type = "numeric" then pv.numeric
            when pd.type = "textual" then pv.textual
            when pd.type = "boolean" then pv.boolean
  from prov_vals pv 
  JOIN prop_defs pd ON pv.prop_def_id = pv.id
 WHERE pv.object_id = 2
   AND pd.name = "Name"

Upvotes: 30

Related Questions