Caveats of EAV vs. direct table manipulation?

Question

I'm working on a small content management system and the data within it is heavily relational. I've started with a small abstraction, but i'm probably going for a full entity-attribute-value model. However, it occurred to me that i'm building all of this on a relational database (pgsql) and it doesn't make any sense to rebuild the model on top of a powerful engine like that.

This is not very common though, which makes me think someone already thought of this and it might be a trap. Could someone explain to me the caveats of creating and altering tables inside an application to represent your user-customizable data model?

Neville Kuyt · Accepted Answer

I assume you've looked at all the down-sides of EAV.

You are very terse in describing what you want to achieve, so I'm going to have a guess - you want some way of allowing your developers and/or content administrators to define a schema, and to allow the software to reason about the data based on that schema. For instance, show all entities of type 'article' belonging to category 'architecture' in status 'published' order by 'publication_date' descending.

That's an easy query to write if you have a relational model and know the attributes types at the time of writing the query.

It's a hard query to write using EAV.

It's also a hard query to write in a relational model if you don't know the attributes in advance - you basically have to write a translation layer between your domain model and the relational database. I've worked with CMS systems which did this, and the limitation they usually impose is "forward-only" changes. You can add attributes, but not remove or change them.

This is because the database is the easy bit - it's mapping that data schema to the domain model that's hard, and the generators they use are pretty good at adding new attributes, but get really confused when renaming things, or changing data types, or removing them. For instance, if you remove attribute "publication_date", you'd have to check all the generated files to see if it occurs, and then transitively look for methods depending on those methods etc.

EDIT:

You write that you want content administrators to create new entities, and relationships between those entities. The long and short of it is that relational databases are not great at this. The closest parallel is object-relational-mapping tools which aim to abstract the underlying database from the object-oriented application talking to it. ORMs are firmly aimed at developers, not end users, and introduce their own cognitive overhead; they tend to work really well until you run into an edge case.

You might look at using a document-oriented solution (MySQL supports both XML and JSON natively) instead. You lose some of the validation and scalability features of the pure relational model, but you gain a LOT of simplicity. For instance, you might have a table called "content_item", with all the fixed attributes that apply to all content items and allow you to implement workflow logic, but store the content itself in a JSON document within the content_item table.

Caveats of EAV vs. direct table manipulation?

Answers (1)

Related Questions