Invictus
Invictus

Reputation: 4328

Find unique values for all the columns of a dataframe

How can i get the unique values of all the column in a dataframe ? I am trying to do something like below as of now.

for col in train_features_df.columns:
    print(train_features_df.col.unique())

But this gives me the error AttributeError: 'DataFrame' object has no attribute 'col'

For e.g for below dataframe i want to the below output

 df = pd.DataFrame({'A':[1,1,3],
               'B':[4,5,6],
               'C':[7,7,7]})

I want a output of 1,3 for A and 4,5,6 for B and 7 for C .

Upvotes: 4

Views: 900

Answers (3)

Loc Quan
Loc Quan

Reputation: 91

Use df.apply(pd.unique) for more readable code which has same output as the accepted answer and slightly faster

df = pd.DataFrame({'A':[1,1,3], 'B':[4,5,6], 'C':[7,7,7]})

df.apply(pd.unique)

Output

A       [1, 3]
B    [4, 5, 6]
C          [7]
dtype: object

Small benchmark

df.apply(pd.unique)

374 μs ± 3.53 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# accepted answer
df.T.apply(lambda x: x.unique(), axis=1)

388 μs ± 3.72 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Upvotes: 0

Ghanshyam Savaliya
Ghanshyam Savaliya

Reputation: 608

you can try for loop with drop_duplicates() to get your desired result, No need to use any complex function.

import pandas as pd
df = pd.DataFrame({'A':[1,1,3],'B':[4,5,6],'C':[7,7,7]})

for i in df.columns:
    print(f'{i} : {list(df[i].drop_duplicates())}')

Output will be as below:

A : [1, 3]
B : [4, 5, 6]
C : [7]

Upvotes: 0

han solo
han solo

Reputation: 6590

You can apply unique on each series by transposing like,

>>> df
   A  B  C
0  1  4  7
1  1  5  7
2  3  6  7
>>> df.T.apply(lambda x: x.unique(), axis=1)
A       [1, 3]
B    [4, 5, 6]
C          [7]
dtype: object
>>> 

Upvotes: 4

Related Questions