taras
taras

Reputation: 2263

How to do long time batch processes in PHP?

I have batch process when i need to update my db table, around 100000-500000 rows, from uploaded CVS file. Normally it takes 20-30 minutes, sometimes longer.

What is the best way to do ? any good practice on that ? Any suggest would be appreciated

Thanks.

Upvotes: 0

Views: 989

Answers (5)

troelskn
troelskn

Reputation: 117477

I cant do it with cron, coz this is under user control, A user click process buttons and later on can check logs to see process status

When the user presses said button, set a flag in a table in the database. Then have your cron job check for this flag. If it's there, start processing, otherwise don't. I applicable, you could use the same table to post some kind of status update (eg. xx% done), so the user has some feedback about the progress.

Upvotes: 0

Tomalak
Tomalak

Reputation: 338178

It takes 30 minutes to import 500.000 rows from a CSV?

Have you considered letting MySQL do the hard work? There is LOAD DATA INFILE, which supports dealing with CSV files:

LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\n';

If the file is not quite in the right shape to be imported right to the target table, you can either use PHP to transform it beforehand, or LOAD it into a "staging" table and let MySQL handle the necessary transformation — whichever is faster and more convenient.

As an additional option, there seems to be a possibility to run MySQL queries asynchronously through the MySQL Native Driver for PHP (MYSQLND). Maybe you can explore that option as well. It would enable you to retain snappy UI performance.

Upvotes: 8

soulmerge
soulmerge

Reputation: 75704

We had a feature like that in a big application. We had the issue of inserting millions of rows from a csv into a table with 9 indexes. After lots of refactoring we found the ideal way to insert the data was to load it into a [temporary] table with the mysql LOAD DATA INFILE command, do the transformations there and copy the result with multiple insert queries into the actual table (INSERT INTO ... SELECT FROM) processing only 50k lines or so with each query (which performed better than issuing a single insert but YMMV).

Upvotes: 1

John Downey
John Downey

Reputation: 14114

The PEAR has a package called Benchmark has a Benchmark_Profiler class that can help you find the slowest section of your code so you can optimize.

Upvotes: 1

Greg
Greg

Reputation: 321618

If you're doing a lot of inserts, are you doing bulk inserts? i.e. like this:

INSERT INTO table (col1 col2) VALUES (val1a, val2a), (val1b, val2b), (....

That will dramatically speed up inserts.

Another thing you can do is disable indexing while you make the changes, then let it rebuild the indexes in one go when you're finished.

A bit more detail about what you're doing and you might get more ideas

Upvotes: 1

Related Questions