radarsh
radarsh

Reputation: 13

How to filter subcategories of rows from one column, based on counts in second column

Sorry it's a bit complicated, but lets say I have a very long table of IDs and Fruits:

ID Fruit
1 Apple
2 Banana
4 Orange
... ...
3 Banana
1 Orange

The ID may be repeated several times in the table and the fruit may also be repeat several times. For example, in the whole dataframe, ID #1 can have 3 instances of "Apple" and 2 instances of "Orange", etc. ID #2 can have 0 instances of "Apple" and 5 instances of "Orange"

My goal is to filter the table such that as long as the ID has one instance of fruit "Apple", then ALL rows with that ID should remain in the table. So let's say that in the table above, ID # 4 and ID #2 do not have a single instance of "Apple" in the entire dataframe. So the ALL rows that contained ID 4 and 2 should be removed from the dataframe

ID Fruit
1 Apple
... ...
3 Banana
1 Orange

Any help is appreciated, thank you!

Upvotes: 1

Views: 42

Answers (2)

jezrael
jezrael

Reputation: 863166

Use Series.isin for filter groups with at least one Apple in column Fruit filtered in DataFrame.loc:

df[df['ID'].isin(df.loc[df['Fruit'].eq('Apple'), 'ID'])]

Upvotes: 0

mozway
mozway

Reputation: 261860

You can use groupby.transform('any') combined with boolean indexing.

out = df[df['Fruit'].eq('Apple').groupby(df['ID']).transform('any')]

Upvotes: 0

Related Questions