Reputation: 4933
I have a site where people can add their favorite TV series. There is one feature that makes it possible to check off episodes that you have seen.
Each episodes that is checked off, creates one record in a DB table (with user_id, show_id and episode_id).
This table is now over 600.000 rows and is growing very fast!
I have indexes set up, but I feel like the performance when querying this table is getting worse and worse.
My thoughts for a new solution:
So instead of:
user_id | show_id | episode_id
1 ....... 123 ......7675
1 ....... 123 ......7676
1 ....... 123 ......7677
1 ....... 456 ......5678
1 ....... 456 ......5679
1 ....... 456 ......5680
I could do this:
user_id | show_id | episode_ids
1 ....... 123 ......7675,7676,7677
1 ....... 456 ......5678,5679,5680
Then I would have to split the string into an array, and use array.include?(some-id).
This should relieve the database, but there would be much heavier array code for Ruby to handle.
Am I on the right track? Or can anybody think of a better solution?
Upvotes: 3
Views: 612
Reputation: 1877
I would stick with the normalized data. It sounds more like a query optimization problem. Keep in mind that mysql (assuming you are using it) uses only one index per query and you might get better performance by setting up a composite index. Also make use of the EXPLAIN statement in the mysql query browser. More info here: http://dev.mysql.com/doc/refman/5.1/en/explain.html
Upvotes: 0
Reputation: 22830
Whether you denormalize your data or not is a matter of debate. It can have its merits in specific circumstances, but from a relational point of view it probably shouldn't be your first choice. Instead, the preferred first steps in solving this problem should be to analyze it and implement solutions that don't change the structure of the data but predominantly deal with the database system and its environment. Therefore:
That being said - if you find that your proposed solution is the best way to improve performance, go ahead and denormalize. The point is that you should be are aware of all options and choose the best ones with concrete performance-related goals in mind.
Upvotes: 2
Reputation: 5421
Here's how I'd structure the tables:
USERS
userid INTEGER PRIMARY KEY
username text/varchar/whatever
SHOWS
showid INTEGER PK
showname varchar or nvarchar or text [depending on what database I was using]
etc etc
EPISODES
episodeid INTEGER PK
showid INTEGER FK references SHOWS [index this field]
ordinal DECIMAL [indicates which episode -- DECIMAL makes it easier to insert later an episode you overlooked]
episodename text/varchar/nvarchar whatever
etc etc
SEENIT
id INTEGER AUTOINCREMENT PK
userid INTEGER foreign key ref USERS
episodeid INTEGER foreign key ref EPISODES
You could place an alternate unique composite index on (userid, episodeid) or use separate indexes, one on userid, one on episodeid. I'd probably go with the latter.
Upvotes: 1
Reputation: 63538
No No no, that is absolutely NOT the way to structure such a database. Comma-separated lists in varchar fields are the least desirable anti-pattern you should consider.
This sounds to me like your performance problems are based on guesswork. So instead:
600k rows is NOTHING (in a table with three ints). Really. This can fit into ram on even the tiniest of servers. Querying a table out of ram should be so fast you don't worry about it.
If you get past step 1 (there really is a problem), ask further questions containing your entire relevant schema, exact queries, explain plans and timing data.
Upvotes: 13