Reputation: 720
I have a data frame like this:
ID A B C
"Z" "apple" 1 5
"Z" "pear" 3 1
"C" "apple" 1 8
"E" "strawberry" 2 5
"E" "pear" 5 1
"D" "apple" 1 5
"D" "pear" 3 1
"D" "melon" 1 5
For those with the same id, I want to filter the rows as follows: if there are two records under one ID and one of them is "apple", I want to delete the row containing "apple". If I have more than two records and one of them is "apple" as well as if I have more than one record for that ID and none of them is "apple" I want to delete all records belonging to that ID. So the df should come out like this:
ID A B C
"Z" "pear" 3 1
"C" "apple" 1 8
I am guessing that the starting point should be a groupby by ID, but I can't figure out how to write a function that does the rest.
Upvotes: 1
Views: 52
Reputation: 862581
Use boolean indexing
with chained mask by |
for bitwise OR
:
#filter apples
m0 = df['A'].eq('apple')
#get counts per groups
s = df.groupby('ID')['ID'].transform('size')
#check if at least one apple per group
m2 = m0.groupby(df['ID']).transform('any')
#chain mask with length 2, at least one apple and all not apples OR
#length 1 with apples
df = df[(s.eq(2) & ~m0 & m2) | (s.eq(1) & m0)]
print (df)
ID A B C
1 Z pear 3 1
2 C apple 1 8
Details:
with pd.option_context('expand_frame_repr', False):
print (df.assign(m = m0,
s = s,
m2 = m2,
s2 = s.eq(2),
invm0 = ~m0,
first = (s.eq(2) & ~m0 & m2),
s1 = s.eq(1),
second = (s.eq(1) & m0),
both =(s.eq(2) & ~m0 & m2) | (s.eq(1) & m0)
))
ID A B C m s m2 s2 invm0 first s1 second both
0 Z apple 1 5 True 2 True True False False False False False
1 Z pear 3 1 False 2 True True True True False False True
2 C apple 1 8 True 1 True False False False True True True
3 E strawberry 2 5 False 2 False True True False False False False
4 E pear 5 1 False 2 False True True False False False False
5 D apple 1 5 True 3 True False False False False False False
6 D pear 3 1 False 3 True False True False False False False
7 D melon 1 5 False 3 True False True False False False False
Upvotes: 4