Reputation: 4328
How can i get the unique values of all the column in a dataframe ? I am trying to do something like below as of now.
for col in train_features_df.columns:
print(train_features_df.col.unique())
But this gives me the error AttributeError: 'DataFrame' object has no attribute 'col'
For e.g for below dataframe i want to the below output
df = pd.DataFrame({'A':[1,1,3],
'B':[4,5,6],
'C':[7,7,7]})
I want a output of 1,3 for A and 4,5,6 for B and 7 for C .
Upvotes: 4
Views: 900
Reputation: 91
Use df.apply(pd.unique)
for more readable code which has same output as the accepted answer and slightly faster
df = pd.DataFrame({'A':[1,1,3], 'B':[4,5,6], 'C':[7,7,7]})
df.apply(pd.unique)
Output
A [1, 3]
B [4, 5, 6]
C [7]
dtype: object
Small benchmark
df.apply(pd.unique)
374 μs ± 3.53 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# accepted answer
df.T.apply(lambda x: x.unique(), axis=1)
388 μs ± 3.72 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Upvotes: 0
Reputation: 608
you can try for loop
with drop_duplicates()
to get your desired result, No need to use any complex function.
import pandas as pd
df = pd.DataFrame({'A':[1,1,3],'B':[4,5,6],'C':[7,7,7]})
for i in df.columns:
print(f'{i} : {list(df[i].drop_duplicates())}')
Output will be as below:
A : [1, 3]
B : [4, 5, 6]
C : [7]
Upvotes: 0
Reputation: 6590
You can apply unique
on each series by transposing like,
>>> df
A B C
0 1 4 7
1 1 5 7
2 3 6 7
>>> df.T.apply(lambda x: x.unique(), axis=1)
A [1, 3]
B [4, 5, 6]
C [7]
dtype: object
>>>
Upvotes: 4