nexla
nexla

Reputation: 453

From pandas array without duplicates to another data structure?

I have a pandas dataframe and it has ~ 10k column values. I want to get an array without duplicates, but also have properties such as lookup by index + it's sorted!

import pandas as pd
df = pd.read_csv('path',sep=';')
arr = []
for i in df[0].values:
    if i not in arr:
        d.append(i)

it actually is very time/memory consuming because of the iteration through 10k element array, then looking up if element is not already stored in a newly created array and afterwards appending an element if conditions are matched. I know set has a properties such as there can not be duplicates, but I can not look up element easily by index + it can not be sorted. May be there is another possible solution to it ?

Upvotes: 0

Views: 233

Answers (2)

Anton vBR
Anton vBR

Reputation: 18916

You are looking for np.unique:

np.unique(df[0])

Or adapted in pandas as .unique():

df[0].unique()

Upvotes: 1

demirbilek
demirbilek

Reputation: 351

You can use pandas.DataFrame.drop_duplicates for more information drop_duplicates()

Upvotes: 2

Related Questions