Alan
Alan

Reputation: 2498

Does Pandas Have an Alternative to This Syntax I'm Currently Using?

I want to filter my df down to only those rows who have a value in column A which appears less frequently than some threshold. I currently am using a trick with two value_counts(). To explain what I mean:

df = pd.DataFrame([[1, 2, 3], [1, 4, 5], [6, 7, 8]], columns=['A', 'B', 'C']) 

'''
    A   B   C
0   1   2   3
1   1   4   5
2   6   7   8
'''

I want to remove any row whose value in the A column appears < 2 times in the column A. I currently do this:

df = df[df['A'].isin(df.A.value_counts()[df.A.value_counts() >= 2].index)]

Does Pandas have a method to do this which is cleaner than having to call value_counts() twice?

Upvotes: 0

Views: 41

Answers (1)

Chris
Chris

Reputation: 16147

It's probably easiest to filter by group size, where the groups are done on column A.

df.groupby('A').filter(lambda x: len(x) >=2)

Upvotes: 3

Related Questions