Jarda K.
Jarda K.

Reputation: 161

PHP - processing big data

I am trying processing big data by PHP (100 000 000 records). I am downloading every record from different server, then make some text checkings and probably 10% of appropriate records inserting in my DB (MySQL). My problem is:

  1. web browser just finishes and processing is prematurely interrupted

  2. after every xy checkings I want print in browser count of appropriate records, but nothing is printed

MySQL is not problem. It looks like browser can print something after script is finished, but I want interrupt that script (thread) for short time, print my result by browser and then continue. During processing data browser is "freezed". Any ideas?

example:

    for ($i=0; $i<100000000; $i++) {
      if (($i % 1000) == 0) {  //every 1000th iteration
        echo $i;               <=== HERE I need interrupt script and let browser print my result
      }
    }

Upvotes: 2

Views: 5991

Answers (1)

Xymanek
Xymanek

Reputation: 1389

First of all, ignore_user_abort is your best friend. Pair with set_time_limit(0) and you've secured your process from dying.

Second, sending something to user is a hard task. The reason being is that whatever you output goes through a couple of buffers. This includes PHP's, Apache's, load balancers that your application might use, browser's, etc. (note: the buffers can usually be easily configured (that is disabled), but they are there for a reason). Therefore simply echoing might not always work.

There is a number of things you can do to address this issue.

One solution is to use real-time communication services like Pusher or Ably (I prefer the latter for nicer free plan) or roll out your own solution using web sockets. Then you would simply send a message every 1k iterations and listen for it in your JS. A bonus is that if user's browser crashes and he reopens it, the process and the updates will be still running correctly. This is (in my opinion) is the most proper way to do this but can be hard to get right.

Another solution would be to split the data. That is JS sends an ajax request, it processes 1k rows, echos and dies. Then JS sends another request which processes the next 2k rows. This can be easier done but relies on the client to send the requests. Also "downloading every record from different server" can be hard to make using this method

Upvotes: 1

Related Questions