Reputation: 360

Alternative Database Design to EAV

I need to model a persons database in a relational database (MySQL).

Each person has properties. There are properties which do have only a 1:1 relation (e.g. gender male/female) and others which do have 1:n relations such as sports or language (e.g. a person might play basketball and football and speak english and german ). Additionaly these 1:n relation have a skill level (e.g. professional, beginner).

Currently I am asking myself if there is a better approach than the EAV model to model the person - properties relation. My concern is primarily to make filtering for the persons having special properties easier (e.g. all actors which are male and (do play basketball professionally OR are beginners) AND do speak english professionally. It should be possible to easily add new properties (whereas this must not necessarily be completely dynamic). It is ok if this needs to be done by a developer and a table needs to be altered. But it should not be to difficult (e.g. modifiying sql statements, adding joins, adding database tables/lookup tables).

I would go for a classical column based design, having the 1:1 properties in the persons table in a separate column for each property. I am uncertain which is the best way to model the 1:n relation in such a design. I would like to avoid to have a lookup and separate table for each 1:n property.

The best approach still seems to be the following EAV approach:

A Persons table having the columns id,name e.g.

1 | Christian

A Properties table having the columns person_id, property, value, level, e.g:

1 | gender   | male       |
1 | sports   | basketball | professional
1 | sports   | football   | beginner
1 | language | english    | professional
1 | language | german     | basic

Upvotes: 0

Answers (2)

Neville Kuyt

Reputation: 29629

EAV is most suitable when you don't know the schema of your data in advance, and you don't want to have a developer modify your system for every new set of data.

From what you've said, that's not the case here.

EAV has many drawbacks - for instance, you can't rely on the built-in relational model to validate your schema. So, if your user "Christian" doesn't have a value specifying his gender, your application just has to deal with it - whereas in a traditional schema, you have gender column which you declare "not null", and link to a "genders" lookup table. This is a big deal for most applications - enforcing the validity of your data at the application level is non-trivial.

The second big drawback with EAV is that queries which are easy using SQL become hideously complex, and performance degrades rather quickly, because every item in a where clause (e.g. "where gender = 'm'") becomes a subquery.

So, I would definitely model the data whose schema you know as a "traditional" database. If you want to avoid lookup tables, you can - instead of having a foreign key to the "gender" table, you can just rely on your application to know that the valid options are "m" and "f" (don't forget to deal with the weirdness that can happen here - is "M" valid, instead of "m"?).

In order to model 1:n relationships, you can create a separate table, e.g. "person_sports", with a foreign key relationship to the "person" table. You can, and perhaps should, also have a lookup table for "sports", in which case you have a many-to-many relationship.

Upvotes: 5

Gordon Linoff

Reputation: 1269753

The answer to your question really depends on what is going to happen to the database over time. Here are some questions to ask:

How often will new attributes be added?
How often will new persons be added?
Will new persons be added in bulk or one at a time?
Will the retrievals tend to be more of all attributes for a person or just a few attributes for lots of persons?

If you have a development period where features are being added, and then the data structure will be stabilized during the development period, use a traditional E-R approach. During development, adding new columns is not particularly onerous.

Also, if you are planning on handling millions of people, with dozens or hundreds of attributes, then consider the issue of performance. This might dissuade you from EAV.

There is an alternative database approach which works very well when you are adding persons in bulk and retrieving just a few attributes at a time. Once upon a time, this was called vertical partitioning but now seems to go by the name of column partitioning. In this, you would store the different attributes in different tables. The 1-1 attributes would have the same primary key, which should make the joins go very quickly in memory -- almost an unnoticeable performance hit. For the 1-n attributes, you would want a composite primary key, with the person as the first element, and data pages that are not full by default (this allows you to do updates on the same data page).

Adding a new attribute would then simply require adding a new table to store it, populating it for existing persons, and changing any views on the database to use it.

Some commercial databases are specialized for such structures (such as Vertica), but they are much more expensive than mysql.

Upvotes: 0

Alternative Database Design to EAV

Answers (2)

Related Questions