Reputation: 306
I am trying to get an array where it would remove all unique rows based on the first column. My array works looks like this,
[['Aaple' 'Red']
['Aaple' '0.0']
['Banana' 'Yellow']
['Banana' '0.0']
['Orange' 'Orange']
['Pear' 'Yellow']
['Pear' '0.0']
['Strawberry' 'Red']]
I want it to look like this,
[['Aaple' 'Red']
['Aaple' '0.0']
['Banana' 'Yellow']
['Banana' '0.0']
['Pear' 'Yellow']
['Pear' '0.0']]
Where it would remove the unique values from column one. My current code looks like this,
arr = np.array(["Aaple", "Pear", "Banana"])
arr2 = np.array([["Strawberry", "Red"], ["Aaple", "Red"], ["Orange", "Orange"], ["Pear", "Yellow"], ["Banana", "Yellow"]])
arr = arr.reshape(-1,1)
zero_arr = np.zeros((len(arr), 1))
arr = np.column_stack((arr, zero_arr))
combine = np.vstack((arr2, arr))
sort = combine[combine[:,0].argsort()]
#Where the first array printed is sort
I was able to get ['Aaple' 'Banana' 'Pear']
, the rows I want to keep by adding x = sort[:-1][sort[1:] == sort[:-1]]
, what would be the next steps?
Upvotes: 0
Views: 52
Reputation: 30609
It may be easier to use pandas:
df = pd.DataFrame(sort, columns=list('ab'))
df[df.groupby('a').a.transform('count')>1].values
Result:
array([['Aaple', 'Red'],
['Aaple', '0.0'],
['Banana', 'Yellow'],
['Banana', '0.0'],
['Pear', 'Yellow'],
['Pear', '0.0']], dtype=object)
Upvotes: 1