Tasos
Tasos

Reputation: 7587

How to make faster my script on python?

I have a script in python but it takes more than 20 hours to run until the end.

Since my code is pretty big, I will post a simplified one.

The first part of the code:

flag = 1
mydic = {}
for i in mylist:
    mydic[flag] = myfunction(i)
    flag += 1

mylist has more than 700 entries and each time I call myfunction it run for around 20sec.

So, I was thinking if I can use paraller programming to split the iteration into two groups and run it simultaneously. Is that possible and will I need the half time than before?

The second part of the code:

mymatrix = []
for n1 in range(0,flag):
    mat = []
    for n2 in range(0,flag):
        if n1 >= n2:
            mat.append(0)
        else:
            res = myfunction2(mydic(n1),mydic(n2))
            mat.append(res)
    mymatrix.append(mat)

So, if mylist has 700 entries, I want to create a 700x700 matrix where it is upper triangular matrix. But the myfunction2() needs around 30sec each time. I don't know if I can use parallel programming here too.

I cannot simplify the myfunction() and myfunction2() since they are functions where I call an external api and return the results.

Do you have any suggestion of how can I change it to make it faster.

Upvotes: 0

Views: 201

Answers (2)

Jonathan Vanasco
Jonathan Vanasco

Reputation: 15690

Based on your comments, I think it's very likely that the 30seconds of time is mostly due to external API calls. I would add some timing code to test what portions of your code are actually responsible for the slowness.

If it is from the external API calls, there are some easy fixes. The external API calls block, so you'll get a speedup if you can move to a parallel model ( though 30s of blocking sounds huge to me ).

I think it would be easiest to create a quick "task list" by having the output of 2 loops be a matrix of arguments to pass into a function. Then I'd pipe them into Celery to run the tasks. That should give you a decent speedup with a minimal amount of work.

You would probably save a lot more time with the threading or multiprocessing modules to run tasks (or sections) , or even write it all in Twisted python - but that usually takes longer than a simple celery function.

The one caveat with the Celery approach is that you'll be dispatching a lot of work - so you'll have to have some functionality to poll for results. That could be a while loop that just sleeps(10) and repeats itself until celery has a result for every task. If you do it in Twisted, you can access/track results on finish. I've never had to do something like this with multiprocessing, so don't know how that would fit in.

Upvotes: 1

user2776074
user2776074

Reputation:

how about using a generator for the second part instead of one of the for loops

def fn():
    for n1 in range(0, flag):
        yield n1

generate = fn()

while True:
    a = next(generate)
    for n2 in range(0, flag):
        if a >= n2:
            mat.append(0)
        else:
            mat.append(myfunction2(mydic(a),mydic(n2))
            mymatrix.append(mat)

Upvotes: 0

Related Questions