Reputation: 2802
I have a numpy array with three columns of the form:
x1 y1 f1
x2 y2 f2
...
xn yn fn
The (x,y) pairs may repeat. I would need another array such that each (x,y) pair appears once and the corresponding third column is the sum of all the f values that appeared next to (x,y).
For example, the array
1 2 4.0
1 1 5.0
1 2 3.0
0 1 9.0
would give
0 1 9.0
1 1 5.0
1 2 7.0
The order of rows is not relevant. What is the fastest way to do this in Python?
Thank you!
Upvotes: 3
Views: 3968
Reputation: 2802
Thanks to @hpaulj, finally found the simplest solution. If d contains the 3-column data:
ind =d[0:2].astype(int)
x = zeros(shape=(N,N))
add.at(x,list(ind),d[2])
This solution assumes that the (x,y) indices in the first two columns are integer and smaller than N. This is what I need and should have mentioned in the post.
Edit: Note that the above solution produces a sparse matrix with the sum values at position (x,y) within the matrix.
Upvotes: 1
Reputation: 231395
If you have scipy
, the sparse module does this kind of addition - again for an array where the 1st 2 columns are integers - ie. indexes.
from scipy import sparse
M = sparse.csr_matrix((d[:,0], (d[:,1],d[:,2])))
M = M.tocoo() # there may be a short cut to this csr coo round trip
x = np.column_stack([M.row, M.col, M.data]) # needs testing
For convenience in constructing certain kinds of linear algebra matrices, the csr
sparse array format sums values with duplicate indices. It's implemented in compiled code so should be fairly fast. But putting the data into M
and taking it back out might slow it down.
(ps. I haven't tested this script since I'm writing this on a machine without scipy
).
Upvotes: 0
Reputation: 103844
Certainly easily done in Python:
arr = np.array([[1,2,4.0],
[1,1,5.0],
[1,2,3.0],
[0,1,9.0]])
d={}
for x, y, z in arr:
d.setdefault((x,y), 0)
d[x,y]+=z
>>> d
{(1.0, 2.0): 7.0, (0.0, 1.0): 9.0, (1.0, 1.0): 5.0}
Then translate back to numpy:
>>> np.array([[x,y,d[(x,y)]] for x,y in d.keys()])
array([[ 1., 2., 7.],
[ 0., 1., 9.],
[ 1., 1., 5.]])
Upvotes: 0
Reputation: 221574
This would be one approach to solve it -
import numpy as np
# Input array
A = np.array([[1,2,4.0],
[1,1,5.0],
[1,2,3.0],
[0,1,9.0]])
# Extract xy columns
xy = A[:,0:2]
# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(xy.T)
sorted_xy = xy[sorted_idx,:]
# Differentiation along rows for sorted array
df1 = np.diff(sorted_xy,axis=0)
df2 = np.append([True],np.any(df1!=0,1),0)
# OR df2 = np.append([True],np.logical_or(df1[:,0]!=0,df1[:,1]!=0),0)
# OR df2 = np.append([True],np.dot(df1!=0,[True,True]),0)
# Get unique sorted labels
sorted_labels = df2.cumsum(0)-1
# Get labels
labels = np.zeros_like(sorted_idx)
labels[sorted_idx] = sorted_labels
# Get unique indices
unq_idx = sorted_idx[df2]
# Get counts and unique rows and setup output array
counts = np.bincount(labels, weights=A[:,2])
unq_rows = xy[unq_idx,:]
out = np.append(unq_rows,counts.ravel()[:,None],1)
Input & Output -
In [169]: A
Out[169]:
array([[ 1., 2., 4.],
[ 1., 1., 5.],
[ 1., 2., 3.],
[ 0., 1., 9.]])
In [170]: out
Out[170]:
array([[ 0., 1., 9.],
[ 1., 1., 5.],
[ 1., 2., 7.]])
Upvotes: 2