Reputation: 1676
So I have ~10 million records in an excel file that have to be parsed out in a specific way (I can't just convert to CSV and insert it like that) and inserted into different tables of a mysql database. I've gotten it down from taking all night to taking only a couple hours. However I would like to decrease this even further. Anyone have any tricks or tips that could help me? I'm using Java and JDBC to parse and connect.
Upvotes: 2
Views: 3002
Reputation: 16559
An idea...
Create a staging (temporary) database in mysql with a table called excel_staging that matches the structure of your excel file - use myisam engine for this table.
Use load data infile to load the excel file (saved as csv) into the excel_staging table - shouldn't take more than a few mins to populate especially as it's myisam.
truncate table excel_staging;
load data infile 'excel_staging.csv'
into table excel_staging
fields terminated by...
lines terminated by..
(
field1,
field2,
...
);
Write lots of select into outfile statements that extract data from the excel_staging table into individual csv files that you will use to load into your individual innodb production database tables. You can be really creative at this point if necessary - you may even have to load extra data to support joins etc so you can generate a nicely formatted csv output.
select distinct cust_id, name into outfile 'customers.csv'
fields termniated by...
lines terminated by...
from
excel_staging
order by
cust_id; -- order for innodb import
select distinct dept_id, name into outfile 'departments.csv'
fields termniated by...
lines terminated by...
from
excel_staging
order by
dept_id;
Load the nicely formatted, cleansed and orderd by primary key csv files into your production innodb tables using load data infile...
load data infile 'customers.csv'
into table customers
fields terminated by...
lines terminated by..
(
cust_id,
name
);
...
Excluding the time to code the solution (30 mins say) should be able to load into staging, output into csv and load into production tables in about ermm... 6 mins end to end.
Hope this helps.
Upvotes: 2
Reputation: 4402
A few JDBC performance tips, set your connection object's autoCommit to false. But make sure to commit after a significant number of inserts (every 100K or more). Also, use and reuse a PreparedStatement object over a plain Statement object.
Upvotes: 1
Reputation: 3464
Mysql allows you to load from a file. Perhaps what you should do is: read 10000 records and create a file. Start that running load data infile
in parallel while you start reading the next 10000 records.
So this should get you closer to a fast solution:
Upvotes: 5
Reputation: 48795
Make sure to disable foreign key checks while you're doing your inserting (only affects InnoDB), there's a pretty drastic speedup. And then of course re-enable foreign keys when you're done.
Upvotes: 1
Reputation: 5848
Look into using executeBatch and doing blocks of 1000 or so. That will help a lot.
Upvotes: 3