Reputation: 2263
I have batch process when i need to update my db table, around 100000-500000 rows, from uploaded CVS file. Normally it takes 20-30 minutes, sometimes longer.
What is the best way to do ? any good practice on that ? Any suggest would be appreciated
Thanks.
Upvotes: 0
Views: 989
Reputation: 117477
I cant do it with cron, coz this is under user control, A user click process buttons and later on can check logs to see process status
When the user presses said button, set a flag in a table in the database. Then have your cron job check for this flag. If it's there, start processing, otherwise don't. I applicable, you could use the same table to post some kind of status update (eg. xx% done), so the user has some feedback about the progress.
Upvotes: 0
Reputation: 338178
It takes 30 minutes to import 500.000 rows from a CSV?
Have you considered letting MySQL do the hard work? There is LOAD DATA INFILE
, which supports dealing with CSV files:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';
If the file is not quite in the right shape to be imported right to the target table, you can either use PHP to transform it beforehand, or LOAD it into a "staging" table and let MySQL handle the necessary transformation — whichever is faster and more convenient.
As an additional option, there seems to be a possibility to run MySQL queries asynchronously through the MySQL Native Driver for PHP (MYSQLND). Maybe you can explore that option as well. It would enable you to retain snappy UI performance.
Upvotes: 8
Reputation: 75704
We had a feature like that in a big application. We had the issue of inserting millions of rows from a csv into a table with 9 indexes. After lots of refactoring we found the ideal way to insert the data was to load it into a [temporary] table with the mysql LOAD DATA INFILE command, do the transformations there and copy the result with multiple insert queries into the actual table (INSERT INTO ... SELECT FROM
) processing only 50k lines or so with each query (which performed better than issuing a single insert but YMMV).
Upvotes: 1
Reputation: 14114
The PEAR has a package called Benchmark has a Benchmark_Profiler class that can help you find the slowest section of your code so you can optimize.
Upvotes: 1
Reputation: 321618
If you're doing a lot of inserts, are you doing bulk inserts? i.e. like this:
INSERT INTO table (col1 col2) VALUES (val1a, val2a), (val1b, val2b), (....
That will dramatically speed up inserts.
Another thing you can do is disable indexing while you make the changes, then let it rebuild the indexes in one go when you're finished.
A bit more detail about what you're doing and you might get more ideas
Upvotes: 1