Reputation: 2498
I want to filter my df
down to only those rows who have a value in column A
which appears less frequently than some threshold. I currently am using a trick with two value_counts()
. To explain what I mean:
df = pd.DataFrame([[1, 2, 3], [1, 4, 5], [6, 7, 8]], columns=['A', 'B', 'C'])
'''
A B C
0 1 2 3
1 1 4 5
2 6 7 8
'''
I want to remove any row whose value in the A
column appears < 2
times in the column A
. I currently do this:
df = df[df['A'].isin(df.A.value_counts()[df.A.value_counts() >= 2].index)]
Does Pandas have a method to do this which is cleaner than having to call value_counts()
twice?
Upvotes: 0
Views: 41
Reputation: 16147
It's probably easiest to filter by group size, where the groups are done on column A.
df.groupby('A').filter(lambda x: len(x) >=2)
Upvotes: 3