user3541631
user3541631

Reputation: 4008

Postgres database: how to model multiple attributes that can have also multiple value, and have relations to other two entities

I have three entities, Items, Categories, and Attributes.

An Item can be in one or multiple Categories, so there is N:M relation.

Item               ItemCategories                   Categories     

id name            item_id category_id             id    name     
1  alfa               1         1                   1   chipset
                      1         2                   2   interface 

An Item can have multiple Attributes depending on the 'Categories' they are in.

For example, the items in Category 'chipset' can have as attributes: 'interface', 'memory' 'tech'.

These attributes have a set of predefined values that don't change often, but they can change.

For example: 'memory' can only be ddr2, ddr3, ddr4.

Attributes                                 CategoryAttributes

id name     values                          category_id attribute_id    
1  memory   {ddr2, ddr3, ddr4}              1              1

An Item that is in the 'chipset' Category has access to the Attribute and can only have Null or the predefined value of the attribute.

I thought to use Enum or Json for Attribute values, but I have two other conditions:

ItemAttributes

item_id attribute_id   value
 1        1          {ddr2, ddr4}

1) If an Attribute appears in 2 Categories, and an Ithe is in both categories, only once an attribute can be shown.

2) I need to use the value with rank, so if two corresponding attribute values appear for an item, the rank should be greater if it is only one, or the value doesn't exist.

3)Creating separate tables for Attributes is not an option, because the number is not fixed, and can be big.

So, I don't know exactly the best options in the database design are to constrain the values and use for order ranking.

Upvotes: 1

Views: 2461

Answers (1)

Carlos Robles
Carlos Robles

Reputation: 10947

The problem you are describing is a typical open schema or vertical database, which is a classic use case for some kind of EAV model.

EAV is a complex yet powerful paradigm that allows a potentially open schema while respecting the database normal forms and allows to have what you need: having a variable number of attributes depending on specific instances of the same entity.

That is what happens typically in e-commerce using relational database since different products have different attributes (i.e a lipstick has color, but maybe for a hard drive you dont care about color but about capacity) and it doesn't make sense to have one attribute table, because the number is not fixed and can be big, and for most rows, there would be a lot of NULL values (that is the mathematical notion of a sparse matrix, that looks very ugly in a DB table)

You can take a look at Magento DB Model, a true reference in pure EAV at scale, or Wikipedia, but probably you can do that later, and for now, you just need the basics:

The basic idea is to store attributes, and their corresponding values as rows, instead of columns, in a single table.

In the simpler implementation the table has at least three columns: entity (usually a foreign key to an entity, or entity type/category), attribute (this can be a string, o a foreign key in more complex systems), and value.

In my previous example, oversimplifying, we could have a table like this, that lists attribute names and its values for

Item table                 Attributes table

+------+--------------+   +-------------+-----------+-------------+
|   id | name         |   |     item_id | attribute | value       |
+------+--------------+   +-------------+-----------+-------------+
|    1 | "hard drive" |   |           2 |   "color" |       "red" | 
+------+--------------+   +-------------+-----------+-------------+
|    2 | "lipstick"   |   |           2 |   "price" |          10 | 
+------+--------------+   +-------------+-----------+-------------+
                          |           1 | "capacity"|       "1TB" | 
                          +-------------+-----------+-------------+
                          |           1 |   "price" |         200 | 
                          +-------------+-----------+-------------+

 So for every item, you can have a list of attributes. 

Since your model is more complex, has a few more constraints, so we need to adapt this model.

  • Since you want to limit the possible values, you will need a table for values
  • Since you will have a values table, the values hast to refer to an attribute, so you need the attributes to have an id, so you will have an attribute table
  • to make explicit and strict what categories have what attribute, you need a category-attribute table

With this, you end up with something like

Categories table List of categories ids and names

+------+--------------+ 
|   id | name         |  
+------+--------------+ 
|    1 | "chipset"    | 
+------+--------------+ 
|    2 | "interface"  | 
+------+--------------+ 

Attributes table List of attribute ids and their name

+------+--------------+ 
|   id | name         |  
+------+--------------+ 
|    1 | "interface"  | 
+------+--------------+ 
|    2 | "memory"     | 
+------+--------------+ 
|    3 | "tech"       | 
+------+--------------+
|    4 | "price"      | 
+------+--------------+

Category-Attribute table What category has what attributes. Note that one attribute (i.e 4) can belong to 2 categories

+--------------+--------------+ 
| attribute_id | category_id  |  
+--------------+--------------+ 
|            1 |            1 | 
+--------------+--------------+ 
|            2 |            1 | 
+--------------+--------------+ 
|            3 |            1 | 
+--------------+--------------+ 
|            4 |            1 | 
+--------------+--------------+ 
|            4 |            2 | 
+--------------+--------------+ 

Value table List of possible values for every attribute

+----------+--------------+--------+
| value_id | attribute_id | value  |
+-------------+-----------+--------+
|        1 |            2 | "ddr2" | 
+----------+--------------+--------+
|        2 |            2 | "ddr3" | 
+----------+--------------+--------+
|        3 |            2 | "ddr4" | 
+----------+--------------+--------+
|        4 |            3 |"tech_1"| 
+----------+--------------+--------+
|        5 |            3 |"tech_2"| 
+----------+--------------+--------+
|        6 |     ...      |   ...  | 
+----------+--------------+--------+
|        7 |     ...      |   ...  | 

And finally, what you can imagine, the

Item-Attribute table will list one attribute value per row

+----------+--------------+-------+
| item_id  | attribute_id | value |
+----------+-----------+----------+
|        1 |            2 |     1 | 
+----------+--------------+-------+
|        1 |            2 |     3 | 
+----------+--------------+-------+

Meaning that item 1, for attribute 2 (`memory`), has values 1 and 3 (`ddr2` and `ddr3`)

This will cover all your conditions:

  • Number of attributes is unlimited, as big as needed and not fixed
  • You can define clearly what category has what attributes
  • Two categories can have the same attribute
  • If 1 item belongs to two categories that have the same attribute, you can show only one (ie SELECT * from Category-Attribute where category_id in (SELECT category_id from ItemCategories where item_id = ...) will give you the list of eligible attributes, only one of each even if 2 categories had the same
  • You can do a rank, I think I dont have enough info for this query, but being this a fully normalized model, definitely, you can do a rank. You have here pretty much the full model, so surely you can figure out the query.

This is very similar to the model that Magento uses. It is very powerful but of course, it can get hard to manage, but it is the best way if we want to keep the model strict and make sure that it will enforce the constraints and that will accept all the SQL functions. For systems less strict, it is always an option to go for a NoSQL database with much more flexible schemas.

Upvotes: 5

Related Questions