Adam Baranyai
Adam Baranyai

Reputation: 3867

Storing attributes with multiple values in relational database

I am storing product attributes in a relational table in a MariaDB database the following way:

I have a main table, called Products which provide the name, description, and other simple information about a product, and another table, ProductAttributes, with the following structure: Id|ProductId|Attribute|Value where Id is an autoincremented primary key, and ProductId is a reference to a row in the Products table.

I can store simple attribute value relations to a product in this way, say ie, height, weight, length of a product. My problems start, when a product's attribute, ie color can have multiple possible values.

I could add multiple lines to the ProductAttributes table when storing multi-valued attributes, ie:

1|yy|color|red
2|yy|color|blue

and from this schema, I could easily retrieve a single product's attributes, but I am having trouble on how to go forward when trying to compare two products based on their attributes.

Is there any other way to store multiple values for a single attribute in a relational database to maintain their searchability?

As of now, to find similar attributed products I am doing a similar query:

SELECT * FROM ProductAttributes base 
 INNER JOIN ProductAttributes compare ON compare.ProductId != base.ProductId
 WHERE base.Attribute = compare.Attribute
   AND base.Value = compare.Value 
   AND base.ProductId = 'x' 
 GROUP BY compare.ProductId

My problem is, that this query will return the products with a red and blue color, as similar to products with a blue color.

Btw, I can not change my attributes tables to a one attribute per column representation, because I do not know from the get-go how many attributes will I have, and even if I knew, I have way too many possible attributes and differences on each product category, to represent this in a traditional table.

A possible pitfall is, that I also want to compare products to one another with missing attributes. Ie, if a product has a length attribute specified, but another one has no length attribute, they could still be similar. Right now, to make this kind of comparison, in the background, I am transposing my attributes table, to a simple table, and on that table, perform this query:

SELECT b.ProductId as BaseProduct, s.ProductId as SimProduct 
  FROM tmp_transposed_product_attributes b 
 CROSS JOIN tmp_transposed_product_attributes s ON b.ProductId != s.ProductId 
 WHERE (b.attribute1 = s.attribute1 OR b.attribute1 IS NULL OR s.attribute1 IS NULL) 
   AND (b.attribute2 = s.attribute2 OR b.attribute2 IS NULL OR s.attribute2 IS NULL) ...

Upvotes: 0

Views: 963

Answers (1)

Paul T.
Paul T.

Reputation: 4907

If I'm following correctly for the product comparison, I like to use EXISTS or NOT EXISTS to help find things like that, which may also help avoid having to transpose the data.

For example, given this sample table data:

MariaDB [test]> select * from productattributes;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
|  1 | yy        | height    | 5     |
|  2 | yy        | color     | red   |
|  3 | yy        | weight    | 10    |
|  4 | yy        | length    | 6     |
|  5 | yy        | color     | blue  |
|  6 | zz        | color     | white |
|  7 | zz        | height    | 5     |
|  8 | zz        | length    | 8     |

+----+-----------+-----------+-------+
8 rows in set (0.00 sec)

To find all similar attributes between the two, but has different values (removes attribute/values pairs that are the same) use a NOT EXISTS query to same table like so:

MariaDB [test]> SELECT * FROM `productattributes` pA
    ->  WHERE productID IN ('yy', 'zz')
    ->    AND NOT EXISTS (SELECT * FROM productattributes pB
    ->                     WHERE pA.attribute = pB.attribute
    ->                       AND pA.value = pB.value
    ->                       AND pA.productID != pB.productID)
    ->  ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
|  2 | yy        | color     | red   |
|  5 | yy        | color     | blue  |
|  4 | yy        | length    | 6     |
|  3 | yy        | weight    | 10    |
|  6 | zz        | color     | white |
|  8 | zz        | length    | 8     |
+----+-----------+-----------+-------+
6 rows in set (0.00 sec)

Then to find attribute/value pairs that ARE the same between the two, simply remove the NOT portion of the query:

MariaDB [test]> SELECT * FROM `productattributes` pA
    ->  WHERE productID IN ('yy', 'zz')
    ->    AND EXISTS (SELECT * FROM productattributes pB
    ->                 WHERE pA.attribute = pB.attribute
    ->                   AND pA.value = pB.value
    ->                   AND pA.productID != pB.productID)
    ->  ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
|  1 | yy        | height    | 5     |
|  7 | zz        | height    | 5     |
+----+-----------+-----------+-------+
2 rows in set (0.00 sec)

Here's the query without the command line junk:

SELECT * FROM `productattributes` pA
 WHERE productID IN ('yy', 'zz')
   AND NOT EXISTS (SELECT * FROM productattributes pB
                    WHERE pA.attribute = pB.attribute
                      AND pA.value = pB.value 
                      AND pA.productID != pB.productID)
 ORDER BY productID, attribute;

EDIT:
To cover the case where there is an attribute that is in one but not the other, then the value check of the query can be removed:

MariaDB [test]> SELECT * FROM `productattributes` pA
    ->  WHERE productID IN ('yy', 'zz')
    ->    AND NOT EXISTS (SELECT * FROM productattributes pB
    ->                     WHERE pA.attribute = pB.attribute
    ->                       AND pA.productID != pB.productID)
    ->  ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
|  3 | yy        | weight    | 10    |
+----+-----------+-----------+-------+
1 row in set (0.00 sec)

Upvotes: 1

Related Questions