Doug Wolfgram
Doug Wolfgram

Reputation: 2126

Mysql optimization for simple records - what is best?

I am developing a system that will eventually have millions of users. Each user of the system may have acces to different 'tabs' in the system. I am tracking this with a table called usertabs. There are two ways to handle this.

Way 1: A single row for each user containing userid and tab1-tab10 as int columns.

The advantage of this system is that the query to get a single row by userid is very fast while the disadvantage is that the 'empty' columns take up space. Another disadvantage is that when I needed to add a new tab, I would have to re-org the entire table which could be tedious if there are millions of records. But this wouldn't happen very often.

Way 2: A single row contains userid and tabid and that is all. There would be up to 10 rows per user.

The advantage of this system is easy sharding or other mechanism for optimized storage and no wasted space. Rows only exist when necessary. The disadvantage is up to 10 rows must be read every time I access a record. If these rows are scattered, they may be slower to access or maybe faster, depending on how they were stored?

My programmer side is leaning towards Way 1 while my big data side is leaning towards Way 2.

Which would you choose? Why?

Upvotes: 0

Views: 62

Answers (2)

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52127

If these rows are scattered, they may be slower to access or maybe faster, depending on how they were stored?

They don't have to be scattered if you use clustering correctly.

InnoDB tables are always clustered and if your child table's PK1 looks similar to: {user_id, tab_id}2, this will automatically store tabs belonging to the same user physically close together, minimizing I/O during querying for "tabs of the give user".

OTOH, if your child PK is: {tab_id, user_id}, this will store users connected to the same tab physically close together, making queries such as: "give me all users connected to given tab" very fast.

Unfortunately MySQL doesn't support leading-edge index compression (a-la Oracle), so you'll still pay the storage (and cache) price for repeating all these user_ids (or tab_ids in the second case) in the child table, but despite that, I'd still go for the solution (2) for flexibility and (probably) ease of querying.


1 Which InnoDB automatically uses as clustering key.

2 I.e. user's PK is at the leading edge of the child table's PK.

Upvotes: 1

Neville Kuyt
Neville Kuyt

Reputation: 29639

Premature optimization, and all that...

Option 1 may seem "easier", but you've already identified the major downside - extensibility is a huge pain.

I also really doubt that it would be faster than option 2 - databases are pretty much designed specifically to find related bits of data, and finding 10 records rather than 1 record is almost certainly not going to make a difference you can measure.

"Scattered" records don't really matter, the database uses indices to be able to retrieve data really quickly, regardless of their physical location.

This does, of course, depend on using indices for foreign keys, as @Barmar comments.

Upvotes: 2

Related Questions