Reputation: 1631
I have a matrix X of size (d,N). In other words, there are N vectors with d dimensions each. For example,
X = [[1,2,3,4],[5,6,7,8]]
there are N=4 vectors of d=2 dimensions.
Also, I have rag array (list of lists). Indices are indexing columns in the X matrix. For example,
I = [ [0,1], [1,2,3] ]
The I[0]=[0,1] indexes columns 0 and 1 in matrix X. Similarly the element I[1] indexes columns 1,2 and 3. Notice that elements of I are lists that are not of the same length!
What I would like to do, is to index the columns in the matrix X using each element in I, sum the vectors and get a vector. Repeat this for each element of I and thus build a new matrix Y. The matrix Y should have as many d-dimensional vectors as there are elements in I array. In my example, the Y matrix will have 2 vectors of 2 dimensions.
In my example, the element I[0] tells to get columns 0 and 1 from matrix X. Sum the two vectors 2-dimensional vectors of matrix X and put this vector in Y (column 0). Then, element I[1] tells to sum the columns 1,2 and 3 of matrix X and put this new vector in Y (column 1).
I can do this easily using a loop but I would like to vectorize this operation if possible. My matrix X has hundreds of thousands of columns and the I indexing matrix has tens of thousands elements (each element is a short lists of indices).
My loopy code :
Y = np.zeros( (d,len(I)) )
for i,idx in enumerate(I):
Y[:,i] = np.sum( X[:,idx], axis=1 )
Upvotes: 3
Views: 1804
Reputation: 221524
Here's an approach -
# Get a flattened version of indices
idx0 = np.concatenate(I)
# Get indices at which we need to do "intervaled-summation" along axis=1
cut_idx = np.append(0,map(len,I))[:-1].cumsum()
# Finally index into cols of array with flattend indices & perform summation
out = np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Step-by-step run -
In [67]: X
Out[67]:
array([[ 1, 2, 3, 4],
[15, 6, 17, 8]])
In [68]: I
Out[68]: array([[0, 2, 3, 1], [2, 3, 1], [2, 3]], dtype=object)
In [69]: idx0 = np.concatenate(I)
In [70]: idx0 # Flattened indices
Out[70]: array([0, 2, 3, 1, 2, 3, 1, 2, 3])
In [71]: cut_idx = np.append(0,map(len,I))[:-1].cumsum()
In [72]: cut_idx # We need to do addition in intervals limited by these indices
Out[72]: array([0, 4, 7])
In [74]: X[:,idx0] # Select all of the indexed columns
Out[74]:
array([[ 1, 3, 4, 2, 3, 4, 2, 3, 4],
[15, 17, 8, 6, 17, 8, 6, 17, 8]])
In [75]: np.add.reduceat(X[:,idx0], cut_idx,axis=1)
Out[75]:
array([[10, 9, 7],
[46, 31, 25]])
Upvotes: 4