long many to many-database table: best performance practice

Question

I have a question about performance of my MYSQL database design.

Table A has a lot of records, say a million, and table B also has a million. There is another table C in which every record id of A is connected to every row in B and this connection has an additional value 1 or 0. So functionally speaking, every record in A has a boolean vector where B contains the 'variables' of the vector and 1 or 0 is the value. It's explained more graphically in the image on bottom.

Table C will have a lot of write and read actions (select all values from a record of A), so the the table is very actively used. And table C is really long, a million times a million rows.

My first question is, will the length of the table give a performance issue? the database needs to be really fast.
My second question is, if this is badly designed, whether there is a better design to achieve what i want. For instance I can think of storing the entire B vector of each A record inside of each row in A. Then table C will not be necessary. But it will make selecting, reading, writing much more difficult.

Thorsten Kettner · Accepted Answer

The table design is fine and shouldn't be a problem, because you access records via IDs which should be indexed. Depending on your typical queries you should also consider adding composite indexes (c(a_id,b_id), c(a_id,value), c(b_id,value), c(a_id,b_id,value)).

However, as there exist only two states, 0 and 1, you may decide only to store one of them. I.e. if you store all state 1 records only, all pairs not in the table have state 0 then implicitly. This pays especially when the states are unevenly distributed (say 90% of the records have state 0 and only 10% have state 1) or you usually access only one of the states (e.g. you always look for 1s).

long many to many-database table: best performance practice

Answers (2)

Related Questions