Andrei Bârsan
Andrei Bârsan

Reputation: 3523

Many highly similar objects in the same database table

Hello, stackoverflow community!

I am working on a rather large database-driven web application. The underlying database is growing in complexity as more components are being added, but so far I've had absolutely no trouble normalizing the data quite nicely.

However, this final component implies a table that can hold products. Each product has a category, and depending on the category, has different fields. Making a table for each product category doesn't seem right, as there are currently five types, and they still have quite a lot of fields in common. (but in weird ways - a few general fields such as description and price are common to all 5 categories, but some attributes are shared between 1 and 2, others 3,4,5 and so on).

I'm trying to steer away from the EAV model for obvious performance reasons.

The thing is that according to what product type the user wants to enter into the database there is a somewhat (but not completely) different field structure - all of them have a name and general description, but other attributes such as "area covered" can be applied only to certain categories such as seeds and pesticides, but not fuel, which would have a diesel/gasoline boolean and a bunch of other fuel-related attributes.

Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.

My current idea would be to have the product table contain all the fields from all the possible categories, and then just have another table to describe which category from the product table has which fields.

product:        id | type | name | description | price | composition | area covered | etc.

fields:         id | name (contains a list of the fields in the above table)

product-fields: id | product_type | field_id (links a bunch of fields to the product table based on the product type)

I reckon this wouldn't be too slow, easy to search (no need to actually join the other tables, just perform the search on the main product table based on some inputs) and it would facilitate things like form generation and data validation with just one lightweight additional query /join. (fetch a product from the db and join a concatenated list of the fields actually used in a string - split that and display the proper form fields based on what it contains, i.e. the fields actually associated with that product.

Thanks for your trouble! Andrei Bârsan

Upvotes: 1

Views: 659

Answers (3)

Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.

This is called a SuperType - SubType relationship. It works very well if most of your queries are one of two types:

  1. If you will be querying mostly the SupetType table and only drilling down into the SubType table infrequently.
  2. If you will be querying the database after being filtered to a specific SubType.

Upvotes: 1

MatBailie
MatBailie

Reputation: 86716

EAV can actually be quite good at storing data and fetching that databack again when you know the key. It also excels in it's ability to add fields without changing the schema. But where it's quite poor is when you need the equivilent of WHERE field1 = x and field2 = y.

So while I agree the data behaviour is important (how many products share the same fields, etc), the use of that data is also important.

  • Which fields need searching, which fields are always just data storage, etc

In most cases I'd suggest keeping all fields that need searching, in combination with each other, in the same table.

In practice this often leads to a single table solution.

  • New fields require schema changes, new indexes, etc
  • Potential for sparsely populated data, using more space than is 'required'
  • Allows simple queries, simple indexing and often the fastest queries
  • Often, though not always, the space overhead is marginal

Where the sparse-data overheads reach a critical point, I would then head towards additional tables grouped by what fields they contain. More specifically, I would not create tables by product. This is on the dual assumption that most/all fields will be shared across at least some products, and that those fields will need searching.

This gives a schema more like...

Main_table ( PK, Product_Type, Field1, Field2, Field3 )
Geo_table  ( PK, county, longitute, latitude )
Value      ( PK, cost, sale_price, tax )
etc

You may also have a meta-data table describing which product types have which fields, etc.

What this schema allows is a more densly populated set of tables, which can be easily indexed and so quickly searched, while minimising table clutter and joins by grouping related fields.


In the end, there isn't a true answer, it's all a balancing act. My general rule of thumb is to stay with a single table until I actually have a real and pressing reason not to, not just a theoretical one.

Upvotes: 2

dbrin
dbrin

Reputation: 15673

In my experience unless you are writing a a complete framework that can render fully described fields (we are talking about a lot of metadata describing each field) it is not worth separating field definitions from the main object. Modern frameworks (like Grails) allow for virtual zero pain adding a new column to a domain/Model class and table.

If your common field overlap is about 80% between all object types I would put them all in 1 table and use Table per Hierarchy inheritance model, where a descriminator field helps you tell your object types apart. On the other hand if you have 20% overlap of common fields then go with Table per Class inheritance model with base class and table containing common fields. And other joint tables hang off the base.

Upvotes: 1

Related Questions