Reputation: 1682
I am trying to come up with a faster way of coding what I want to. Here is the part of my program I am trying to speed up, hopefully using more inbuilt functions:
num = 0
num1 = 0
rand1 = rand_pos[0:10]
time1 = time.clock()
for rand in rand1:
for gal in gal_pos:
num1 = dist(gal, rand)
num = num + num1
time2 = time.clock()
time_elap = time2-time1
print time_elap
Here, rand_pos and gal_pos are lists of length 900 and 1 million respectively. Here dist is function where I calculate the distance between two points in euclidean space. I used a snippet of the rand_pos to get a time measurement. My time measurements are coming to be about 125 seconds. This is way too long! It means that if I run the code over all the rand_pos, it will take about three hours to do! Is there a faster way I can do this?
Here is the dist function:
def dist(pos1,pos2):
n = 0
dist_x = pos1[0]-pos2[0]
dist_y = pos1[1]-pos2[1]
dist_z = pos1[2]-pos2[2]
if dist_x<radius and dist_y<radius and dist_z<radius:
positions = [pos1,pos2]
distance = scipy.spatial.distance.pdist(positions, metric = 'euclidean')
if distance<radius:
n = 1
return n
Upvotes: 0
Views: 134
Reputation: 7309
There is a function in scipy
that does exactly what you want to do here:
scipy.spatial.distance.cdist(gal, rand1, metric='euclidean')
It will be faster than anything you write in pure Python
probably, since the heavy lifting (looping over the pairwise combinations between arrays) is implemented in C
.
Currently your loop is happening in Python, which means there is more overhead per iteration, then you are making many calls to pdist
. Even though pdist
is very optimized, the overhead of making so many calls to it slows down your code. This type of performance issue was once described to me with a very useful analogy: its like trying to have a conversation with someone over the phone by saying one word per phone call, even though each word is going across the line very fast, your conversation will take a long time because you need to hang up and dial again repeatedly.
Upvotes: 2
Reputation: 21914
While most of the optimization probably needs to happen within your dist
function, there are some tips here to speed things up:
# Don't manually sum
for rand in rand1:
num += sum([dist(gal, rand) for gal in gal_pos])
#If you can vectorize something, then do
import numpy as np
new_dist = np.vectorize(dist)
for rand in rand1:
num += np.sum(new_dist(gal_pos, rand))
# use already-built code whenever possible (as already suggested)
scipy.spatial.distance.cdist(gal, rand1, metric='euclidean')
Upvotes: 3