RabbitBadger
RabbitBadger

Reputation: 589

Find duplicate values in a 2D array column

I have an array with the following shape

(N, 2) Below is an example of the 2d array I have at hand:

[[0,2]
[0,3]
[1,2]
[1,3]
[1,4]]

I would like to get all the values in the second index that have duplicates. In the example above, I would like to have values 2 and 3 returned.

Is there a specific np function for this sort of task?

It seems like it's the opposite of np.unique but I have yet to find a working function for this problem.

Upvotes: 3

Views: 1670

Answers (3)

yatu
yatu

Reputation: 88275

You could index on the second column and use np.bincount to find the indices with counts higher than 1:

a = np.array([[0,2],
            [0,3],
            [1,2],
            [1,3],
            [1,4]])

np.flatnonzero(np.bincount(a[:,1])>1)
# array([2, 3], dtype=int64)

Or for large integers, np.unique will probably be a better option:

u, c = np.unique(a[:,1], return_counts=True)
u[c>1]
# array([2, 3])

Upvotes: 3

Ahmad Anis
Ahmad Anis

Reputation: 2704

You can use Counter from collections to perform this task.

z = np.array([[0,2],
            [0,3],
            [1,2],
            [1,3],
            [1,4]])

Now you can loop over desired index to check the duplicates.

from collections import Counter
dup = [item for item, count in Counter(z[:, 1]).items() if count > 1] 
print(dup)
Out[12]: [2, 3]

Upvotes: 1

David Meu
David Meu

Reputation: 1545

You probably need something like:

   arr = [[0,2],
    [0,3],
    [1,2],
    [1,3],
    [1,4]]
    
    from collections import defaultdict
    d = defaultdict(int)
    for item in arr:
        d[item[1]]+=1
    for k, v in d.items():
        if d[k] > 1:
            print(k)

Upvotes: 1

Related Questions