Reputation: 411
Given the following array:
np.array([1,2,3,3,3,4,4,5])
I would like to get the indices of the duplicate values only, so the output would look like:
np.array([3,4,6])
This is so basic I feel like I am missing some simple numpy command but I cannot seem to find a simple non-looping solution. Anyone have any ideas?
Ideally the solution would be efficient (no looping) and scale to multiple dimensions.
A multidimensional case could look like this: Given,
[[3,2,1,1], [2,2,1,1]]
It should return
[3]
Since there is a duplicate [1,1]
at index 3
Upvotes: 1
Views: 199
Reputation: 12397
Find the unique values using np.unique
and the rest are duplicates (This solution does NOT require your array to be sorted. You can easily extend this to multi-dimensional. Please provide a sample input/output for multi-dimension and I will update it):
np.delete(np.arange(a.size), np.unique(a,return_index=True)[1])
output:
[3 4 6]
UPDATE: Per OP's update on multi-dimensional case:
at = a.T
b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1])))
np.delete(np.arange(b.size), np.unique(b,return_index=True)[1])
Or similarly and easier suggested by @Adrix in comments:
np.delete(np.arange(a.shape[1]), np.unique(a,return_index=True, axis=1)[1])
output:
[3]
Upvotes: 1
Reputation: 5141
One way is to combine np.where and np.diff (duplicates are items which difference equals zero):
>>> arr = np.array([1,2,3,3,3,4,4,5])
>>> np.where(np.diff(arr) == 0)[0] + 1
array([3, 4, 6])
Upvotes: 1
Reputation: 36594
This could work:
[i for i in np.arange(len(x)) if i not in np.unique(x, return_index=True)[1]]
[3, 4, 6]
Admittedly, the filtering part can definitely be improved.
Upvotes: 0