Reputation: 2386
In the past week I've been trying to migrate a database containing approximately 90 million rows from MySQL to a newly created Couchbase instance. I've researched the web for possible solutions for doing so and found some tools which ultimately failed due to low memory availability. I also read about partitioning but I'm no expert in MySQL administration so this seemed like an over-reach for my abilities at the moment. Eventually I decided to implement my own designated script which would select a certain amount of data from the existing MySQL table, serialize it for Couchbase's newly created bucket and insert it there. The tool works great for the first 5 million records, but then the instance of MySQL takes way too long to retrieve further records.
It is worth mentioning that the MySQL table I'm working is only being used by me, thus no changes are being made during the migration process.
The script I built leverages the LIMIT OFFSET
statement as stated in the Select Syntax Documentation and looks like this:
SELECT * FROM data LIMIT ?,?
Where ?,?
is generated by increasing the starting point of the selection by a certain amount of records. For example, the following are possible queries done by a single migration process:
SELECT * FROM data LIMIT 0,100000
SELECT * FROM data LIMIT 100000,200000
SELECT * FROM data LIMIT 200000,300000
...
The migration process will stop when no records are retrieved. As I previously stated, the queries which select records starting from position of about 5 million are taking too long and make the migration process undoable. I'm no database expert and have done nothing other than creating a new MySQL database and tables via MySQL Workbench 6.3 CE and no optimizations have been made on my data. The table I'm trying to migrate contains one column which acts as a key, non-null, and has a unique value. All other columns have no options enabled on them.
I would like to know if there is any other way for me to select the data sequentially so it could be inserted without duplicates or corruption. Any help on this matter is greatly appreciated!
Upvotes: 2
Views: 1909
Reputation: 108746
I guess MySQL starts taking an unusably long time to satisfy your LIMIT
clauses when their numbers get larger. LIMIT
does that.
You'll have much better luck using an indexed colum to select each segment of your table to export. There's no harm done if some segments contain fewer rows than others.
For example you could do
SELECT * FROM data WHERE datestamp >= '2017-01-01' AND datestamp < '2017-02-01';
SELECT * FROM data WHERE datestamp >= '2017-02-01' AND datestamp < '2017-03-01';
SELECT * FROM data WHERE datestamp >= '2017-03-01' AND datestamp < '2017-04-01';
SELECT * FROM data WHERE datestamp >= '2017-04-01' AND datestamp < '2017-05-01';
SELECT * FROM data WHERE datestamp >= '2017-05-01' AND datestamp < '2017-06-01';
SELECT * FROM data WHERE datestamp >= '2017-06-01' AND datestamp < '2017-07-01';
...
to break out your records by calendar month (assuming you have a datestamp
column).
Or, if you have an autoincrementing primary key id
column try this
SELECT * FROM data WHERE id < 100000;
SELECT * FROM data WHERE id>= 100000 AND id < 200000;
SELECT * FROM data WHERE id>= 200000 AND id < 300000;
SELECT * FROM data WHERE id>= 300000 AND id < 400000;
SELECT * FROM data WHERE id>= 400000 AND id < 500000;
SELECT * FROM data WHERE id>= 500000 AND id < 600000;
...
An entirely diffent approach that will still work. In your dumping program do
SELECT * FROM data;
then have the program switch to another output file every n records. For example, pseudo code
rowcount = 100000
rownum = 0
rowsleft = rowcount
open file 'out' + 000000;
while next input record available {
read record
write record
rownum = rownum + 1
rowsleft = rowsleft - 1
if rowsleft <= 1 {
close file
open file 'out' + rownum
rowsleft = rowcount
}
}
close file
This will use a single MySQL query, so you won't have to worry about segments. It should be quite fast.
Upvotes: 2
Reputation: 4937
You are wrongly doing the pagination. See Using MySQL LIMIT to Constrain The Number of Rows Returned By SELECT Statement
The following illustrates the LIMIT clause syntax with two arguments:
SELECT
column1,column2,...
FROM
table
LIMIT offset , count;
So you should have a fixed page size (count) and a variable offset with no overlaping.
SELECT * FROM data LIMIT 0,100000
SELECT * FROM data LIMIT 100000,100000
SELECT * FROM data LIMIT 200000,100000
....
SELECT * FROM data LIMIT 89900000,100000
Upvotes: 2