francesco
francesco

Reputation:

deleting rows of a numpy array based on uniqueness of a value

let's say I have a bi-dimensional array like that

numpy.array(
    [[0,1,1.2,3],
    [1,5,3.2,4],
    [3,4,2.8,4], 
    [2,6,2.3,5]])

I want to have an array formed eliminating whole rows based on uniqueness of values of last column, selecting the row to keep based on value of third column. e.g. in this case i would like to keep only one of the rows with 4 as last column, and choose the one which has the minor value of third column, having something like that as a result:

array([0,1,1.2,3],
      [3,4,2.8,4],
      [2,6,2.3,5])

thus eliminating row [1,5,3.2,4]

which would be the best way to do it?

Upvotes: 4

Views: 2468

Answers (2)

divenex
divenex

Reputation: 17238

This can be achieved efficiently in Numpy by combining lexsort and unique as follows

import numpy as np

a = np.array([[0, 1, 1.2, 3], 
              [1, 5, 3.2, 4],
              [3, 4, 2.8, 4], 
              [2, 6, 2.3, 5]])

# Sort by last column and 3rd column when values are equal
j = np.lexsort(a.T)

# Find first occurrence (=smallest 3rd column) of unique values in last column
k = np.unique(a[j, -1], return_index=True)[1]

print(a[j[k]])

This returns the desired result

[[ 0.   1.   1.2  3. ]
 [ 3.   4.   2.8  4. ]
 [ 2.   6.   2.3  5. ]]

Upvotes: 1

llimllib
llimllib

Reputation: 3702

My numpy is way out of practice, but this should work:

#keepers is a dictionary of type int: (int, int)
#the key is the row's final value, and the tuple is (row index, row[2])
keepers = {}
deletions = []
for i, row in enumerate(n):
    key = row[3]
    if key not in keepers:
        keepers[key] = (i, row[2])
    else:
        if row[2] > keepers[key][1]:
            deletions.append(i)
        else:
            deletions.append(keepers[key][0])
            keepers[key] = (i, row[2])
o = numpy.delete(n, deletions, axis=0)

I've greatly simplified it from my declarative solution, which was getting quite unwieldy. Hopefully this is easier to follow; all we do is maintain a dictionary of values that we want to keep and a list of indexes we want to delete.

Upvotes: 1

Related Questions