Reputation: 85
I am trying to find an efficient code instead of the following piece of code (that is only one part of my code), to increase the speed:
for pr in some_list:
Tp = T[partition[pr]].sum(0)
Tpx = np.dot(Tp, xhat)
hp = h[partition[[pr]].sum(0)
up = (uk[partition[pr][:]].sum(0))/len(partition[pr])
hpu = hpu + np.dot(hp.T, up)
Tpu = Tpu + np.dot(Tp.T, up)
I have at least two more similar blocks of code. As you can see, I used fancy indexing three times (really couldn't find another way). In my algorithm, I need this part to be done very quickly, but it's not happening now. I will really appreciate any suggestion.
Thank you all.
Best,
Upvotes: 1
Views: 1703
Reputation: 35080
If your partitions are few and have many elements each, you should consider swapping around the indices of your objects. Summing an array of shape (30,1000)
along its second dimension should be faster than summing an array of shape (1000,30)
along its first dimension, since in the former case you are always summing contiguous blocks of memory (i.e. arr[k,:]
for each k
) for each remaining index. So if you put the summation index last (and get rid of some trailing singleton dimension while you're at it), you might get speed-up.
As hpaulj noted in a comment, it's not clear how your loop could be vectorized. However, since it's performance-critical, you could still try vectorizing some of the work.
I suggest that you store hp
, up
and Tp
for each partition (following pre-allocation), then perform the scalar/matrix products in a single vectorized step. Also note that Tpx
is unused in your example, so I omitted it here (whatever you're doing with it, you can do it similarly to the other examples):
part_len = len(some_list) # number of partitions, N
Tpshape = (part_len,) + T.shape[1:] # (N,30,100) if T was (1000,30,100)
hpshape = (part_len,) + h.shape[1:] # (N,30,1) if h was (1000,30,1)
upshape = (part_len,) + uk.shape[1:] # (N,30,1) if uk was (1000,30,1)
Tp = np.zeros(Tpshape)
hp = np.zeros(hpshape)
up = np.zeros(upshape)
for ipr,pr in enumerate(some_list):
Tp[ipr,:,:] = T[partition[pr]].sum(0)
hp[ipr,:,:] = h[partition[[pr]].sum(0)
up[ipr,:,:] = uk[partition[pr]].sum(0)/len(partition[pr])
# compute vectorized dot products:
#Tpx unclear in original, omitted
# sum over second index (dot), sum over first index (sum in loop)
hpu = np.einsum('abc,abd->cd',hp,up) # shape (1,1)
Tpu = np.einsum('abc,abd->cd',Tp,up) # shape (100,1)
Clearly the key player is numpy.einsum
. And of course if hpu
and Tpu
had some prior values before the loop, you have to increment those values with the results from einsum
above.
As for einsum
, it performs summations and contractions of arrays of arbitrary dimensions. The pattern apearing above, 'abc,abd->cd'
, when applied to 3d arrays A
and B
, will return a 2d array C
, with the following definition (math pseudocode):
C(c,d) = sum_a sum_b A(a,b,c)*B(a,b,d)
For a given fix a
summation index, what's inside is
sum_b A(a,b,c)*B(a,b,d)
which, if the c
and d
indices are kept, will be euqivalent to np.dot(A(a,:,:).T,B(a,:,:))
. Since we're summing these matrices with respect to a
too, we're supposed to do exactly what your loopy version does, adding up each np.dot()
contribution of the total sums.
Upvotes: 3