Shankar
Shankar

Reputation: 3624

Efficient way to iterate through numpy arrays in parallel and create a new resultant array

I have 3 numpy arrays dm_w, dm_s and dm_p. I am in need of iterating through these arrays in parallel, do some computation based on a check condition as shown in code below.

My code works well for smaller arrays, but takes too long with larger arrays. I need an efficient and faster method to achieve this. Need some expert opinion.

My code:

prox_mat = []
for w_dist, s_dist, PI in zip(np.nditer(dm_w), np.nditer(dm_s), np.nditer(dm_p)):
    if PI == 0.0:
         proximity_score = ((w_dist + len(np.unique(dm_s) * s_dist)) / 
                           (dm_w.shape[0] * len(np.unique(dm_s))))
         prox_mat.append(proximity_score)
    else:
         proximity_score = ((w_dist + len(np.unique(dm_s) * s_dist)) / 
                           (dm_w.shape[0] * len(np.unique(dm_s)))) * log10(10 * PI)
         prox_mat.append(proximity_score)

ps = np.array(prox_mat)
ps = np.reshape(ps, dm_w.shape)

Upvotes: 1

Views: 1374

Answers (1)

U2EF1
U2EF1

Reputation: 13261

Several things. One, computation of np.unique(dm_s) should be pulled outside of the loop. Even further, it looks like:

len(np.unique(dm_s) * s_dist) == len(np.unique(dm_s))

Which should either be pulled out of the loop or is a mistake. In any case..

We should just vectorize the forloop/append construct:

dm_s_uniques = len(np.unique(dm_s))
logs = np.log10(10 * dm_p)
logs[logs == -np.inf] = 1
prox_mat = ((dm_w +  dm_s_uniques) / (dm_w.shape[0] * dm_s_uniques)) * logs

ps = np.reshape(ps, dm_w.shape)

It looks like I map

Upvotes: 4

Related Questions