Reputation: 4248
I would like to get the indices of the unique rows in an array. A unique row should have its own index (starting with zero). Here is an example:
import numpy as np
a = np.array([[ 0., 1.],
[ 0., 2.],
[ 0., 3.],
[ 0., 1.],
[ 0., 2.],
[ 0., 3.],
[ 0., 1.],
[ 0., 2.],
[ 0., 3.],
[ 1., 1.],
[ 1., 2.],
[ 1., 3.],
[ 1., 1.],
[ 1., 2.],
[ 1., 3.],
[ 1., 1.],
[ 1., 2.],
[ 1., 3.]])
In the above array there are six unique rows:
import pandas as pd
b = pd.DataFrame(a).drop_duplicates().values
array([[ 0., 1.],
[ 0., 2.],
[ 0., 3.],
[ 1., 1.],
[ 1., 2.],
[ 1., 3.]])
Each row represents an index (0, 1, 2, 3, 4 ,5). In order to get the indices of unique rows in array a
, the result would be:
[0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 4, 5, 3, 4, 5, 3, 4, 5]
How can I get to this result in an efficient way?
Upvotes: 0
Views: 78
Reputation: 4248
This is what I got:
b = pd.DataFrame(a).drop_duplicates()
indexed_rows = np.zeros(a.shape[0], dtype=int)
for index, i in enumerate(a):
for unique_row, j in enumerate(b.values):
if np.all(i==j):
indexed_rows[index] = unique_row
The returned result is:
array([0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 4, 5, 3, 4, 5, 3, 4, 5])
Upvotes: 0
Reputation: 18648
A pure numpy solution :
av = a.view(np.complex)
_,inv = np.unique(av,return_inverse=True)
Then inv
is :
array([0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 4, 5, 3, 4, 5, 3, 4, 5], dtype=int64)
np.complex
is for packing the two components, preserving order. for other types, other approaches are possible.
Upvotes: 3
Reputation: 1529
Solution without numpy and pandas:
a = [[0, 1],
[0, 2],
[0, 3],
[0, 1],
[0, 2],
[0, 3],
[0, 1],
[0, 2],
[0, 3],
[1, 1],
[1, 2],
[1, 3],
[1, 1],
[1, 2],
[1, 3],
[1, 1],
[1, 2],
[1, 3]]
b = []
#= ALGORITHM
point = -1 # Increment
cache = [[-1 for x in range(1000)] for x in range(1000)] # Change to dynamic
for i in a:
x = i[0]; y = i[1]
# Check what's going on here...
# print("x: {0} y: {1} --> {2} (cache)".format(x, y, cache[x][y]))
if cache[x][y] == -1:
point += 1
cache[x][y] = point
b.append(point)
else:
b.append(cache[x][y])
#= TESTING
print(b) # [0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 4, 5, 3, 4, 5, 3, 4, 5]
Upvotes: 0