Reputation: 411

Get indices of duplicate columns only

Given the following array:

np.array([1,2,3,3,3,4,4,5])

I would like to get the indices of the duplicate values only, so the output would look like:

np.array([3,4,6])

This is so basic I feel like I am missing some simple numpy command but I cannot seem to find a simple non-looping solution. Anyone have any ideas?

Ideally the solution would be efficient (no looping) and scale to multiple dimensions.

A multidimensional case could look like this: Given,

[[3,2,1,1], [2,2,1,1]]

It should return

[3]

Since there is a duplicate [1,1] at index 3

Upvotes: 1

Answers (3)

Ehsan

Reputation: 12417

Find the unique values using np.unique and the rest are duplicates (This solution does NOT require your array to be sorted. You can easily extend this to multi-dimensional. Please provide a sample input/output for multi-dimension and I will update it):

np.delete(np.arange(a.size), np.unique(a,return_index=True)[1])

output:

[3 4 6]

UPDATE: Per OP's update on multi-dimensional case:

at = a.T
b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1])))
np.delete(np.arange(b.size), np.unique(b,return_index=True)[1])

Or similarly and easier suggested by @Adrix in comments:

np.delete(np.arange(a.shape[1]), np.unique(a,return_index=True, axis=1)[1])

output:

[3]

Upvotes: 1

Aivar Paalberg

Reputation: 5157

One way is to combine np.where and np.diff (duplicates are items which difference equals zero):

>>> arr = np.array([1,2,3,3,3,4,4,5])
>>> np.where(np.diff(arr) == 0)[0] + 1
array([3, 4, 6])

Upvotes: 1

Nicolas Gervais

Reputation: 36704

This could work:

[i for i in np.arange(len(x)) if i not in np.unique(x, return_index=True)[1]]

[3, 4, 6]

Admittedly, the filtering part can definitely be improved.

Upvotes: 0

Get indices of duplicate columns only

Answers (3)

Related Questions