Reputation: 1227
How can I get the value_counts above a threshold? I tried
df[df[col].value_counts(dropna=False) > 3]
to get all counts greater than 3, but I am getting
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Any hint? Thanks
Upvotes: 4
Views: 8342
Reputation: 26
Sticking with value_counts
, here's a simple solution:
df[col].value_counts(dropna=False)[df[col].value_counts(dropna=False) > 3]
Upvotes: 0
Reputation: 323386
Try with isin
and chain with your original value_counts
out = df[df.col.isin(df[col].value_counts(dropna=False).loc[lambda x : x>3].index)].copy()
Also Let us try filter
out = df.groupby(col).filter(lambda x : len(x)>3)
Upvotes: 3
Reputation: 150815
Try:
df[df.groupby(col)[col].transform('size')>3]
Or with value_counts
:
counts = df[col].value_counts(dropna=False)
valids = counts[counts>3].index
df[df[col].isin(valids)]
Another approach with value_counts
and map
:
counts = df[col].value_counts(dropna=False)
df[df[col].map(counts)>3]
Upvotes: 8