Egor4eg
Egor4eg

Reputation: 2708

Best approach to store data which attributes can vary

Please, read first my previous question: T-SQL finding of exactly same values in referenced table

The main purpose of this question is to find out if this approach of storing of data is effective.

Maybe it would be better to get rid of PropertyValues table. And use additional PropertyValues nvarchar(max) column in Entities table instead of it. For example instead of

EntityId  PropertyId  PropertyValue
1         4           Val4
1         5           Val5
1         6           Val6

table, I could store such data in PropertyValues column: "4:Val4;5:Val5;6Val6"

As an alternative, I could store XML in PropertyValues column....

What do you think about the best approach here?

[ADDED] Please, keep in mind:

  1. Set of properties must be customizable
  2. Objects will have dozens of properties (approximately from 20 to 120). Database will contain thousands of objects

[ADDED] Data in PropertyValues table will be changed very often. Actually, I store configured products. For example, admin configures that clothes have attributes "type", "size", "color", "buttons type", "label type", "label location" etc... User will select values for these attributes from the system. So, PropertyValues data cannot be effectively cached.

Upvotes: 1

Views: 416

Answers (3)

Dalex
Dalex

Reputation: 3625

I work with the similar project (web-shop generator). So every product has attribute and every attribute has set of values. It is different tables. And for all of this there are translations in several languages. (So exists additional tables for attributes and values translations).

Why we choose such solution? Because for every client there should be database with the same scheme. So such database scheme is very elastic.

So what about this solution. As always, "it depends" -))

  1. Storage. If your value will be used often for different products, e.g. clothes where attribute "size" and values of sizes will be repeated often, your attribute/values tables will be smaller. Meanwhile, if values will be rather unique that repeatable (e.g. values for attribute "page count" for books), you will get a big enough table with values, where every value will be linked to one product.
  2. Speed. This scheme is not weakest part of project, because here data will be changed rarely. And remember that you always can denormalize database scheme to prepare DW-like solution. You can use caching if database part will be slow too.
  3. Elasticity This is the strongest part of solution. You can easily add/remove attributes and values and ever to move values from one attribute to another!

So answer on your question is not simple. If you prepare elastic scheme with unknown attributes and values, you should use different tables. I suggest to you remember about storing values in CSV strings. It is better to store it as XML (typed and indexed).

UPDATE

I think that PropertyValues will not change often , if comparing with user orders. But if you doubt, you should use denormalization tables or indexed views to speed up.Anyway, changing XML/CSV on large quantity of rows will have poor performance, so "separate table" solution looks good.

Upvotes: 2

Remus Rusanu
Remus Rusanu

Reputation: 294417

The SQL Customer Advisory Team (CAT) has a whitepaper written just for you: Best Practices for Semantic Data Modeling for Performance and Scalability. It goes through the common pitfalls of EAV modeling and recommends how to design a scalable EAV solution.

Upvotes: 1

Yuck
Yuck

Reputation: 50855

You will hate yourself later if you implement a solution using multi-value attributes (i.e. 4:Val4;5:Val5;6Val6).

XML is marginally better because there are XQuery functions to help you pull out and parse the values. But the XML type is implemented as a CLR type in SQL Server and it can get extremely slow to work with.

The best solution to this problem is one like you have. Use the sql_variant type for the column if it could be any number of data types. Ideally you'd refactor this into multiple tables / entities so that the data type can be something more concrete.

Upvotes: 3

Related Questions