Reputation: 361
I need to sort a huge MyISAM table in a certain way so that SELECTs are faster under certain conditions.
Please note that this question is about how to do an ALTER table with zero downtime for SORTING the table in some specific column order. It is not a dup question of other questions asking about the more general case.
A simple way to achieve this is doing something like this:
ALTER table mytable ORDER BY col1, col2;
We may also use myisamchk --sort-records to achieve the same result.
In any case, both approaches let us to do very quick:
SELECT * WHERE col1=x order by col2;
Note that this is not a problem with the index but with fetching large amounts of ordered data from the table.
So far that ALTER has been working well. The problem now is that the ALTER command is slow and it locks the DB.
I believe we may use percona or openark tools for doing the same operation. Something like this:
pt-online-schema-change --alter "ENGINE=MyISAM, ORDER BY col1, col2" D=mydatabase,t=mytable -u root --dry-run
This internally creates a new table copies it and then moves names. It is pretty well documented.
However I'm not sure if/how percona will honour the "ORDER BY". I cannot see anything happening in the dry-run logs (but this may be normal). And this is not explained in the documentation.
Does anyone know how will percona ORDER BY the table?
EDIT: I launched PTDEBUG=1 ./pt-online-schema-change --alter "ENGINE=MyISAM, ORDER BY col1, col2"
on the tests server.
After checking the logs I found out that "ORDER BY" is not being applied... Any ideas? Does openark permit to do so?
Thanks!
Upvotes: 0
Views: 424
Reputation: 361
There's a way to achieve an ORDER BY with pt-online-schema-change.
First, make sure you have an index on the column you want to ORDER BY. Then use percona tool with option "--chunk-index" so that the chosen index is used to fetch the rows in the original table.
There's a problem though. Percona won't be able to fetch rows when indices have poor selectivity. In that case, create a composite index using the column you need to sort + ID for example (or any other column with high cardinality). Will be slow but may be a way yo get online rows sorted.
I got a boost of 10x with a large table of 100M rows which was very fragmented. OPTIMIZE table without column sorting did not improve the situation since values where randomly distributed in a table of 8GB. I hope this finding helps others.
Upvotes: 0