Reputation: 477
I have a very large dataframe with over 2000 columns. I am trying to count the number of unique values for each column and filter out the columns with unique values below a certain number. Here is an example:
import pandas as pd
df = pd.DataFrame({'A': ('a', 'b', 'c', 'd', 'e', 'a', 'a'), 'B': (1, 1, 2, 1, 3, 3, 1)})
df.nunique()
A 5
B 3
dtype: int64
So lets say I wanna filter out column B which has lower than 5 unique values and return a df without column B.
Thanks-
Upvotes: 1
Views: 2816
Reputation: 70
Others may have a more pythonic way. Try this out to see if it works.
x = df.nunique()
df[list(x[x>=5].index)]
Upvotes: 3
Reputation: 323226
Pass the .loc
df=df.loc[:,df.nunique()>3]
A
0 a
1 b
2 c
3 d
4 e
5 a
6 a
Upvotes: 5