From pandas array without duplicates to another data structure?

Question

I have a pandas dataframe and it has ~ 10k column values. I want to get an array without duplicates, but also have properties such as lookup by index + it's sorted!

import pandas as pd
df = pd.read_csv('path',sep=';')
arr = []
for i in df[0].values:
    if i not in arr:
        d.append(i)

it actually is very time/memory consuming because of the iteration through 10k element array, then looking up if element is not already stored in a newly created array and afterwards appending an element if conditions are matched. I know set has a properties such as there can not be duplicates, but I can not look up element easily by index + it can not be sorted. May be there is another possible solution to it ?

Anton vBR · Accepted Answer

You are looking for np.unique:

np.unique(df[0])

Or adapted in pandas as .unique():

df[0].unique()

From pandas array without duplicates to another data structure?

Answers (2)

Related Questions