Reputation: 453
I have a pandas dataframe and it has ~ 10k column values. I want to get an array without duplicates, but also have properties such as lookup by index + it's sorted!
import pandas as pd
df = pd.read_csv('path',sep=';')
arr = []
for i in df[0].values:
if i not in arr:
d.append(i)
it actually is very time/memory consuming because of the iteration through 10k element array, then looking up if element is not already stored in a newly created array and afterwards appending an element if conditions are matched. I know set has a properties such as there can not be duplicates, but I can not look up element easily by index + it can not be sorted. May be there is another possible solution to it ?
Upvotes: 0
Views: 233
Reputation: 18916
You are looking for np.unique:
np.unique(df[0])
Or adapted in pandas as .unique():
df[0].unique()
Upvotes: 1
Reputation: 351
You can use pandas.DataFrame.drop_duplicates for more information drop_duplicates()
Upvotes: 2