jerrytouille
jerrytouille

Reputation: 1238

MySQL 3-way 1..n tables relation

1 database with 3 tables: user - photo - vote
- A user can have many photos.
- A photo can have many votes.
- A user can vote on many photos.
- A vote records:
. the result as an int (-1/disliked, 0/neutral, 1/liked)
. the id of the user who voted.

Here is what I have (all FKs are cascade on delete and update):
http://grab.by/iZYE
(sid = surrogate id)

My question is: this doesn't seem right, and I look at this for 2 days already and can't confidently move on. How can I optimize this or am I completely wrong?

Upvotes: 2

Views: 573

Answers (2)

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52107

MySQL/InnoDB tables are always clustered (more on clustering here and here).

Since primary key also acts as a clustering key1, using the surrogate primary key means you are physically sorting the table in order that doesn't have a useful meaning for the client applications and cannot be utilized for querying.

Furthermore, secondary indexes in clustered tables can be "fatter" than in heap-based tables and may require double lookup.

For these reasons, you'd want to avoid surrogates and use more "natural" keys, similar to this:

enter image description here

({USER_ID, PICTURE_NO} in table VOTE references the same-named fields in PICTURE. The VOTE.VOTER_ID references USER.USER_ID. Use integers for *_ID and *_NO fields if you can.)

This physical model will enable extremely efficient querying for:

  • Pictures of the given user (a simple range scan on PICTURE primary/clustering index).
  • Votes on the given picture (a simple range scan on VOTE primary/clustering index). Depending on circumstances, this may actually be fast enough so you don't have to cache the sum in PICTURE.

If you need votes of the given user, change the VOTE PK to: {VOTER_ID, USER_ID, PICTURE_NO}. If you need both (votes of picture and votes of user), keep the existing PK, but create a covering index on {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}.


1 In InnoDB. There are DBMSes (such as MS SQL Server) where clustering key can differ from primary.

Upvotes: 1

G-Nugget
G-Nugget

Reputation: 8836

The first thing I see is that you have duplicate unique IDs on the tables. You don't need the sid columns; just use user_id, photo_id, and photo_user_id (maybe rename this one to vote_id). Those ID columns should also be INT type, definitely not VARCHARs. You probably don't need the vote total columns on photo; you can just run a query to get the total when you need it and not worry about keeping both tables in sync.

Assuming that you will only allow one vote per user on each photo, the structure of the can be modified so the only columns are user_id, photo_id, and vote_result. You would then make the primary key a composite index on (user_id, photo_id). However, since you're using foreign keys, that makes this table a bit more complicated.

Upvotes: 1

Related Questions