Reputation: 2654
I have a big table (200'000'000 rows); declared like this
thread( forum_id tinyint, thread_id int,
date_first datetime, date_last datetime
replycount mediumint, extra blob )
forum_id and thread_id are the primary key. With big forums (around a million of topics) i sometimes have to run queries like SELECT thread_id FROM thread ORDER BY date_last DESC LIMIT 500000, 10
. These queries with huge offsets takes a second or maybe a few to run.
So i tought that i could, by duplicating data, create a few tables for the forums with most threads to speed this up. Only a few forums are over 100'000 topics, so there will be a table like
thread_for_forumid_123456 ( thread_id int,
date_first datetime, date_last datetime
replycount mediumint )
What do you think about this? Will it speed up huge offset queries? Do you have some other suggestions? Thanks.
Upvotes: 4
Views: 106
Reputation: 360
First, i will REALLY try to avoid your approach. I look at it as the "last chance" to avoid performance issues.
You have alternatives (from hardware to software) on hard you can buy Fusion i/o or just an SSD drive (Raid vs SSD vs FusionIO). But, you can solve this using software and dont investing money at all. On your scenario you should use cache (i.e. memcached) if you aren't already using it. MySQL have partitioning, its not the best choice in the world but, you can have some good performance increase.
BUT, if you go with your idea, i suggest that you shard your data, using a value that split your data across tables more balanced. You can do something nasty like: create 50 tables like thread_0 to thread_49 And then (forumid % 50) so you land on one of this 50 tables. That way you avoid to create a table everytime you create a forum, and you avoid to have N tables. And the select with forum_id indexed will be very fast. Also you can have some sort of logic on the app to manage pagination, and avoid huge offsets.
Good luck!
PS: I add a blogpost on MysqlPerfomanceBlog "Why you dont want to shard"
Upvotes: 2
Reputation: 23563
It looks like your problem is the large offset LIMIT 500000, 10
.
I don't see why you have 5'000'000 here. If you filter for the thread_id and forum_id the offset should be very small because I do not belive you have 5 million post in one thread.
SELECT thread_id FROM thread WHERE thread_id = 123456 ORDER BY date_last DESC LIMIT 10, 50
Maybe also have a look at http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html
Upvotes: 0
Reputation: 10532
MySQL Partitioning sound like a functionality you may consider
Upvotes: 0