Reputation: 7587
I have a script in python but it takes more than 20 hours to run until the end.
Since my code is pretty big, I will post a simplified one.
The first part of the code:
flag = 1
mydic = {}
for i in mylist:
mydic[flag] = myfunction(i)
flag += 1
mylist
has more than 700 entries and each time I call myfunction
it run for around 20sec.
So, I was thinking if I can use paraller programming to split the iteration into two groups and run it simultaneously. Is that possible and will I need the half time than before?
The second part of the code:
mymatrix = []
for n1 in range(0,flag):
mat = []
for n2 in range(0,flag):
if n1 >= n2:
mat.append(0)
else:
res = myfunction2(mydic(n1),mydic(n2))
mat.append(res)
mymatrix.append(mat)
So, if mylist
has 700 entries, I want to create a 700x700 matrix where it is upper triangular matrix. But the myfunction2()
needs around 30sec each time. I don't know if I can use parallel programming here too.
I cannot simplify the myfunction()
and myfunction2()
since they are functions where I call an external api and return the results.
Do you have any suggestion of how can I change it to make it faster.
Upvotes: 0
Views: 201
Reputation: 15690
Based on your comments, I think it's very likely that the 30seconds of time is mostly due to external API calls. I would add some timing code to test what portions of your code are actually responsible for the slowness.
If it is from the external API calls, there are some easy fixes. The external API calls block, so you'll get a speedup if you can move to a parallel model ( though 30s of blocking sounds huge to me ).
I think it would be easiest to create a quick "task list" by having the output of 2 loops be a matrix of arguments to pass into a function. Then I'd pipe them into Celery
to run the tasks. That should give you a decent speedup with a minimal amount of work.
You would probably save a lot more time with the threading
or multiprocessing
modules to run tasks (or sections) , or even write it all in Twisted
python - but that usually takes longer than a simple celery function.
The one caveat with the Celery
approach is that you'll be dispatching a lot of work - so you'll have to have some functionality to poll for results. That could be a while
loop that just sleeps(10)
and repeats itself until celery has a result for every task. If you do it in Twisted
, you can access/track results on finish. I've never had to do something like this with multiprocessing, so don't know how that would fit in.
Upvotes: 1
Reputation:
how about using a generator for the second part instead of one of the for loops
def fn():
for n1 in range(0, flag):
yield n1
generate = fn()
while True:
a = next(generate)
for n2 in range(0, flag):
if a >= n2:
mat.append(0)
else:
mat.append(myfunction2(mydic(a),mydic(n2))
mymatrix.append(mat)
Upvotes: 0