Reputation: 87
Given a 2D array (A) with multiple columns and rows, and a 1D array (B) of the same length. (A) contains duplicate rows and I want to collapse these duplicate rows into one unique entry but add the corresponding values in (B). Currently I am using a dictionary to solve this issue, but I think it is not ideal and too slow if the arrays are long:
example_keys = [[1,0,0,0], [1,1,0,0], [1,1,1,0], [1,0,0,0]]
example_vals = [[2], [3], [1], [10]]
example_dict = {}
i = 0
for row in example_keys:
state_key = tuple(row)
if state_key in example_dict:
# Just add value
example_dict[state_key] += example_vals[i]
else:
# Create entry
example_dict[state_key] = example_vals[i]
i += 1
My desired output would be these two arrays:
edited_keys = [[1,0,0,0], [1,1,0,0], [1,1,1,0]]
edited_vals = [[12], [3], [1]]
The order of the arrays does not matter, as long as the rows are coherent between the arrays. This also needs to work with multiple duplicate rows, not just two. Is there some way to create these arrays, by smartly manipulating the arrays using numpy? Thanks :)
Upvotes: 0
Views: 145
Reputation: 53079
You could use np.unique
:
unq,idx,inv = np.unique(example_keys,axis=0,return_inverse=1,return_index=1)
# change idx order to order of appearance
aux = np.bincount(idx)
nz = aux.nonzero()
aux[idx] = np.arange(idx.size)
idx = aux[nz]
new_keys = unq[idx]
new_vals = np.bincount(inv,np.ravel(example_vals))[idx[:,None]]
new_keys
# array([[1, 0, 0, 0],
# [1, 1, 0, 0],
# [1, 1, 1, 0]])
new_vals
# array([[12.],
# [ 3.],
# [ 1.]])
Upvotes: 1