Reputation: 493
I'm building a cron job that does the following:
1. Get records from DB
2. For each record fire a curl request to an API. (some requests are quick and some are uploading large images or videos).
3. If a request is not successful, create a new request with slightly different parameters (still based on the record) and send it again. This can happen several times.
4. On successful request do some DB select/inserts (based on the original record that caused sending this request).
Sending the requests should happen in parallel as some take minutes (large uploads) and some are very quick.
What would be most appropriate to do this - having a master script that gets the records from the DB and creates a process for each record to handle calling the API and parsing the response? Or using curl_multi to send multiple requests at the same time from the same script and parse each one as it returns?
If using multiple processes what would be the best way to do this - PCNTRL, popen, etc.?
If using curl_multi how would I know which DB record corresponds to which returning request?
EDIT: If using curl multi I'd probably employ this techique: http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/
so that it wouldn't wait for all requests to complete before I start processing the responses.
Thanks!
Upvotes: 3
Views: 1873
Reputation: 493
In the end I went with multiprocessing using PCNTRL (with limiting the number of concurent processes). Seemed to me that curl_multi won't scale for thousands of requests.
Upvotes: 1
Reputation: 3364
I had a similar issue once processing a large dataset.
The simplest answer for me was to make 4 separate scripts, each written to take a specific fourth of the db columns involved and in my case do processing or in your case curl requests. This would prevent a big request on one of the processes from locking up the others.
In contrast a single script using curl_multi is still going to lock on a large request, it would just allow you to queue up multiple at once.
Optimally I'd instead write this in a language with native support for multithreading so you could have things happening concurrently without resorting to hacks but thats understandably not always an option.
Upvotes: 1