Reputation: 949
Assuming we have a two dimensional array like the following:
array1 = np.array([[1,4,3, 64356,5435,434],
[11,46,3, 7356,585,74],
[51,406,3, 769,5435,24],
[12,45,5, 656,135,134],
[112,475,5, 656,1385,134],
[13,46, 5, 656,1385,19]])
the row 4 and 5 are not unique in terms or their 2,3,4 columns , for which we want to drop one of them. Is there an efficient way to drop rows of an array and make its rows unique in terms of selected columns of it?
Upvotes: 1
Views: 202
Reputation: 1274
Convert to pandas and back as suggested by S.Mohsen
Code:
import pandas as pd
import numpy as np
array1 = np.array([[1,4,3, 64356,5435,434],
[11,46,3, 7356,585,74],
[51,406,3, 769,5435,24],
[12,45,5, 656,135,134],
[112,475,5, 656,1385,134],
[13,46, 5, 656,1385,19]])
df = pd.DataFrame(data=array1)
print(df)
df.drop_duplicates(subset=[2,3],inplace=True)
print(df)
array2=df.values
print(array2)
Output:
0 1 2 3 4 5
0 1 4 3 64356 5435 434
1 11 46 3 7356 585 74
2 51 406 3 769 5435 24
3 12 45 5 656 135 134
4 112 475 5 656 1385 134
5 13 46 5 656 1385 19
0 1 2 3 4 5
0 1 4 3 64356 5435 434
1 11 46 3 7356 585 74
2 51 406 3 769 5435 24
3 12 45 5 656 135 134
[[ 1 4 3 64356 5435 434]
[ 11 46 3 7356 585 74]
[ 51 406 3 769 5435 24]
[ 12 45 5 656 135 134]]
Upvotes: 1
Reputation: 150735
A solution in pure numpy:
_, idx = np.unique(array1[:,[2,3,4]], axis=0, return_index=True)
array1[sorted(idx)]
Output:
array([[ 1, 4, 3, 64356, 5435, 434],
[ 11, 46, 3, 7356, 585, 74],
[ 51, 406, 3, 769, 5435, 24],
[ 12, 45, 5, 656, 135, 134],
[ 112, 475, 5, 656, 1385, 134]])
Upvotes: 2