Reputation: 3
I have a script that loops over an array of numbers, those numbers are passed to a function which calls and API. It returns JSON data which is then written to a CSV.
for label_number in label_array:
call_api(domain, api_call_1, api_call_2, label_number, api_key)
The list can be up to 7000 elements big, as the API takes a few seconds to respond this can take hours to run the entire script. Multiprocessing seems the way to go with this. I can't quite working out how to do this with the above loop. The documentation I am looking at is
https://docs.python.org/3.5/library/multiprocessing.html
I found a similar article at
Python Multiprocessing a for loop
But manipulating it doesn't seem to work, I think I am buggering it up when it comes to passing all the variables into the function.
Any help would be appreciated.
Upvotes: 0
Views: 1365
Reputation: 1224
Multiprocessing could help but this sounds more like a threading problem. Any IO implementation should be made asynchronous, which is what threading does. Better, in python3.4
onwards, you could do asyncio
.
https://docs.python.org/3.4/library/asyncio.html
If you have python3.5
, this will be useful: https://docs.python.org/3.5/library/asyncio-task.html#example-hello-world-coroutine
You can mix asyncio
with multiprocessing
to get the optimized result. I use in addition joblib
.
import multiprocessing
from joblib import Parallel, delayed
def parallelProcess(i):
for index, label_number in enumerate(label_array):
if index % i == 0:
call_api_async(domain, api_call_1, api_call_2, label_number, api_key)
if __name__=="__main__":
num_cores_to_use = multiprocessing.cpu_count()
inputs = range(num_cores_to_use)
Parallel(n_jobs=num_cores_to_use)(delayed(parallelProcess)(i) for i in inputs)
Upvotes: 1