deineomaklaut
deineomaklaut

Reputation: 87

Collapse duplicate rows in an array into a single unique row

Given a 2D array (A) with multiple columns and rows, and a 1D array (B) of the same length. (A) contains duplicate rows and I want to collapse these duplicate rows into one unique entry but add the corresponding values in (B). Currently I am using a dictionary to solve this issue, but I think it is not ideal and too slow if the arrays are long:

example_keys = [[1,0,0,0], [1,1,0,0], [1,1,1,0], [1,0,0,0]]
example_vals = [[2], [3], [1], [10]]
example_dict = {}
i = 0

for row in example_keys:

    state_key = tuple(row)

    if state_key in example_dict:
        # Just add value
        example_dict[state_key] += example_vals[i]
    else:
        # Create entry
        example_dict[state_key] = example_vals[i]

    i += 1

My desired output would be these two arrays:

edited_keys = [[1,0,0,0], [1,1,0,0], [1,1,1,0]]
edited_vals = [[12], [3], [1]]

The order of the arrays does not matter, as long as the rows are coherent between the arrays. This also needs to work with multiple duplicate rows, not just two. Is there some way to create these arrays, by smartly manipulating the arrays using numpy? Thanks :)

Upvotes: 0

Views: 145

Answers (1)

Paul Panzer
Paul Panzer

Reputation: 53079

You could use np.unique:

unq,idx,inv = np.unique(example_keys,axis=0,return_inverse=1,return_index=1)

# change idx order to order of appearance
aux = np.bincount(idx)
nz = aux.nonzero()
aux[idx] = np.arange(idx.size)
idx = aux[nz]

new_keys = unq[idx]
new_vals = np.bincount(inv,np.ravel(example_vals))[idx[:,None]]

new_keys
# array([[1, 0, 0, 0],
#        [1, 1, 0, 0],
#        [1, 1, 1, 0]])
new_vals
# array([[12.],
#        [ 3.],
#        [ 1.]])

Upvotes: 1

Related Questions