Reputation: 59
I need a new perspective on how to design a reliable and efficient SQL database to store multi-level arrays of data.
This problem applies to many situations but I came up with this example:
There are hundreds of products. Each product has an undefined number of parts. Each part is built from several elements.
All products are described in the same way. All parts would require the same fields to describe them (let's say: price, weight, part name), all elements of all parts also have uniform design (for example: element code, manufacturer). Plain and simple.
One element may be related to only part, and each part is related to one product only.
I came up with idea of three tables:
Products:
--------------------------------------------
prod_id prod_name prod_price prod_desc
1 hoover 120 unused
next
Parts:
----------------------------------------------------
part_id part_name part_price part_weight prod_id
3 engine 10 20 1
and finally
Elements:
---------------------------------------
el_id el_code el_manufacturer part_id
1 BFG12 GE 3
Now, select a desired product, select all from PARTS where prod_id is the same, and then select all from ELEMENTS where part_id matches - after multiple queries you've got all data.
I'm just not sure if this is the right approach.
I've got also another idea, without ELEMENTS table.
That would decrease queries but I'm a bit afraid it might be lame and bad practice.
Instead of ELEMENTS table there are two more fields in the PARTS table, so it looks like this:
part_id, part_name, part_price, part_weight, prod_id, part_el_code, part_el_manufacturer
they would be text type, and for each part, information about elements would be stored as strings, this way:
part_el_code | code_of_element1; code_of_element2; code_of_element3
part_el_manufacturer | manuf_of_element1; manuf_of_element2; manuf_of_element3
Then all we need is to explode() data from those fields, and we get arrays, easy to display. Of course this is not perfect and has some limitations, but is this idea ok?
The reason I came up with the second variant is that the third table - Elements - would eventually become quite big. If there is 10,000 products, 4 parts for every product, and on average 3 elements per part - it means there must be 120,000 rows in the Elements table. And to be honest - I don't really know if that would contribute to performance problems.
Should I just go with the first idea? Or maybe there is a better approach to this problem?
Upvotes: 1
Views: 297
Reputation: 70538
Describing relationships between things is exactly what relational databases do. Any book on SQL should give many examples and I suggest looking for such a book to understand these issues better but I will give a quick answer to your question.
You are on the right track for the simple solution with your 3 tables, but you don't need three select statements. That is what the join
is used for. With your example the following select statement would give you all the data for a particular product.
select *
from product
join parts on parts.prod_id = product.prod_id
join elements on elements.part_id = part.parts_id
where product.prod_id = x
This is fine and will do exactly what you require.
As far as you 2nd idea, it is really bad. You are working with a relational database, it allows you to store sets of data and relate it. To store a list in a string value is to ignore the exact functionality that relational databases are designed to do well.
However, there is a way to improve your design. I think you might not always have a one-to-one relationship for product and part or part and element. So I would suggest the following.
First remove prod_id from the parts table and then remove part_id from the elements table.
Then create two "joining tables". Lets call them prodparts and partselement. They would look like this:
prodparts would contain two columns prod_id and part_id (and maybe a count?)
partselement would contain two columns part_id and el_id (and maybe a count?)
Now each row in these tables would describe the relationship. BUT and this is the neat part, multiple products could use the same part and multiple parts could use the same element.
The select is only slightly more complicated and looks like this:
select *
from product
join prodparts on procparts.prod_id = product.prod_id
join parts on parts.part_id = prodparts.part_id
join partselement on partelement.part_id = parts.part_id
join elements on elements.el_id = partelement.parts_id
where product.prod_id = x
Upvotes: 2
Reputation: 23123
You're first approach seems quite solid. The second approach would make it difficult to maintain the data of the elements. One thing I noticed in the first design is that the product and the parts both have a "price". Do they mean something different or does the sum of the parts prices equal the product's final price? If so, I recommend removing the price field at the product level and use the sum() function at the parts level.
Other than that... good job!
Upvotes: 0