Reputation: 124
I have a [n x n] matrix containing values belonging to different groups, and a [1 x n] vector defining to which group each element belongs. (n usually ~1E4, in this example n=4)
I want to calculate a matrix obtained by summing together all the elements belonging to the same group.
I use np.where() to calculate the indices where the elements of each group are located. When I use the calculated indices, I do not obtain the expected elements, because I select pairs of positions instead of ranges (I'm used to Matlab, where I can simply select M(idx1,idx2) ).
import numpy as np
n=4
M = np.random.rand(n,n)
print(M)
# This vector defines to which group each element belong
belongToGroup = np.array([0, 1, 0, 2])
nGroups=np.max(belongToGroup);
# Calculate a matrix obtained by summing elements belonging to the same group
M_sum = np.zeros((nGroups+1,nGroups+1))
for g1 in range(nGroups+1):
idxG1 = np.where(belongToGroup==g1)
for g2 in range(nGroups+1):
idxG2 = np.where(belongToGroup==g2)
print('g1 = ' + str(g1))
print('g2 = ' + str(g2))
print(idxG1[0])
print(idxG2[0])
print(M[idxG1[0],idxG2[0]])
print(np.sum(M[idxG1[0],idxG2[0]]))
M_sum[g1,g2]=np.sum(M[idxG1[0],idxG2[0]])
print('')
print('Example of the problem:')
print('Elements I would like to sum to obtain M_sum[0,0]')
print(M[0:2,0:2])
print('Elements that are summed instead')
print(M[[0,1],[0,1]])
Example of the problem: In the example above, the element M_sum[0,0] should be the sum of M[0,0], M[0,1], M[1,0], and M[1,1] Instead, it is calculated as the sum of M[0,0] and M[1,1]
Upvotes: 1
Views: 1350
Reputation: 53029
You can use np.ix_
to obtain matlab-ish behavior:
A = np.arange(9).reshape(3, 3)
A[[1,2],[0,2]]
# array([3, 8])
A[np.ix_([1,2],[0,2])]
# array([[3, 5],
# [6, 8]])
Under the hood, np.ix_
does what @hpaulj describes in detail:
np.ix_([1,2],[0,2])
# (array([[1],
# [2]]), array([[0, 2]]))
You can apply this to your specific problem as follows:
M = np.random.randint(0, 10, (n, n))
M
# array([[6, 2, 7, 1],
# [6, 7, 9, 5],
# [9, 4, 3, 2],
# [3, 1, 7, 9]])
idx = np.array([0, 1, 0, 2])
ng = idx.max() + 1
out = np.zeros((ng, ng), M.dtype)
np.add.at(out, np.ix_(idx, idx), M)
out
# array([[25, 6, 3],
# [15, 7, 5],
# [10, 1, 9]])
As an aside: There is a faster but less obvious solution that relies on flat indexing:
np.bincount(np.ravel_multi_index(np.ix_(idx, idx), (ng, ng)).ravel(), M.ravel(), ng*ng).reshape(ng, ng)
# array([[25., 6., 3.],
# [15., 7., 5.],
# [10., 1., 9.]])
Upvotes: 1
Reputation: 231385
In MATLAB, indexing with 2 lists (actually matrices) picks a block. numpy
on the other hand tries to broadcast the indexing arrays against each other, and returns selected points. Its behavior is close to what sub2ind
does in MATLAB.
In [971]: arr = np.arange(16).reshape(4,4)
In [972]: arr
Out[972]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [973]: i1, i2 = np.array([0,2,3]), np.array([1,2,0])
Indexing with 2 1d arrays of the same size:
In [974]: arr[i1,i2]
Out[974]: array([ 1, 10, 12])
This, in effect returns [arr[0,1], arr[2,2], arr[3,0]]
, one element for each point of matching indices.
But if I turn one index into a 'column vector', it selects from rows, while i2
selects from columns.
In [975]: arr[i1[:,None], i2]
Out[975]:
array([[ 1, 2, 0],
[ 9, 10, 8],
[13, 14, 12]])
MATLAB makes the block indexing easy, while individual access is harder. In numpy
the block access is a bit harder, though the underlying mechanics is the same.
With your example i1[0]
and i2[0]
can be arrays like:
array([0, 2]), array([3])
(2,) (1,)
The shape (1,) array can broadcast with the (2,) or with a (2,1) array just as well. Your code would fail if instead is[0]
were np.array([0,1,2])
, a (3,) array which can't pair with the (2,) array. But with a (2,1) it produces a (2,3) block.
Upvotes: 2