Reputation: 2609
I have a mysql table with fulltext. Every day its size will increase by 30K data items. I am afraid the query speed will decrease when the data size increases to many GB. My situation permits me to limit the search to the most recent one million data items.
First I tried partitioning table
, but mysql does not supprt fulltext
and partition
at the same time.
http://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations.html
Some friends suggest I use another dbms product, such as sphinx
, solr
, or oracle
. But I'd rather use mySQL, for budget reasons. (Forgive my stubbornness).
So, how can I optimize mysql fulltext table searches? how can I keep an acceptable query speed as the volume of my data scales up?
divide table? (then make query complex)
only limit query by id? SELECT * FROM table WHERE (MATCH (some_field) AGAINST ('+search_words' IN BOOLEAN MODE)) AND id>last_id-1000000 AND id<last_id Order By date DESC
another good suggestion? Thanks.
Upvotes: 2
Views: 987
Reputation: 108796
Here's a way to handle your problem. It's not super-beautiful but it will be quite efficient.
Create two copies of your table. Call one something like text_current
and the second something like text_archive
.
Load your newly arriving rows of data into text_current
. That way your current data will be available for search.
Write two stored procedures: one called something like copy_text
that selects older rows from text_current
and inserts them into text_archive
, and another called something like purge_text
that deletes the older rows from text_current
. Then, run those two stored procedures from mySql jobs.
If I were you I would run the copy_text
job once every night at 03:00, and copy every item that arrived any time in the previous day (midnight to midnight).
I would run the purge_text
job once a week, and purge everything from text_current
that is more than 35 days old. (It takes 33.3 days to get a million records at 30K records a day).
This method of migrating your data from a current table to an archive table has some handy features.
Here's how to select records by date, when the record_date
column is a unix time stamp. If you run this on any given day, it will select records from midnight on the previous day, up to but not including records at midnight on the present day.
...
WHERE text_current.record_date >= UNIX_TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 1 DAY))
AND text_current.record_date < UNIX_TIMESTAMP(CURDATE())
...
It doesn't matter at what time of day you run this query, it always runs midnight to midnight. That's the beauty of CURDATE()
-- it means midnight today. (NOW()
means, well, now; it's like CURDATE()
but also has a time. But you don't want that because you want to do your archiving midnight-to-midnight. If you use NOW()
you'll get caught by small unpredictabilities in when you run your stored procedures.
(Don't be tempted to use the BETWEEN
operator for time ranges; it's inclusive and you want the end time of your time range not to be inclusive.)
Similarly, if you want to delete everything that's older than 35 days, do this.
DELETE
FROM text_current
WHERE text_current.record_date < UNIX_TIMESTAMP(DATE_SUB(CURDATE(), INTERVAL 35 DAY))
See how this goes? When you do your archiving and purging, you're always selecting records based on the DAY they were inserted, regardless of the TIME they were inserted. That's what I mean by deterministic.
Upvotes: 5