Delete items in pandas groupby groups based on values

Question

I have a data frame like this:

ID     A              B     C
"Z"    "apple"        1     5
"Z"    "pear"         3     1
"C"    "apple"        1     8
"E"    "strawberry"   2     5
"E"    "pear"         5     1
"D"    "apple"        1     5
"D"    "pear"         3     1
"D"    "melon"        1     5

For those with the same id, I want to filter the rows as follows: if there are two records under one ID and one of them is "apple", I want to delete the row containing "apple". If I have more than two records and one of them is "apple" as well as if I have more than one record for that ID and none of them is "apple" I want to delete all records belonging to that ID. So the df should come out like this:

ID     A              B     C
"Z"    "pear"         3     1
"C"    "apple"        1     8

I am guessing that the starting point should be a groupby by ID, but I can't figure out how to write a function that does the rest.

jezrael · Accepted Answer

Use boolean indexing with chained mask by | for bitwise OR:

#filter apples
m0 = df['A'].eq('apple')
#get counts per groups
s = df.groupby('ID')['ID'].transform('size')
#check if at least one apple per group
m2 = m0.groupby(df['ID']).transform('any')

#chain mask with length 2, at least one apple and all not apples OR 
#length 1 with apples
df = df[(s.eq(2) & ~m0 & m2) | (s.eq(1) & m0)]
print (df)
  ID      A  B  C
1  Z   pear  3  1
2  C  apple  1  8

Details:

with pd.option_context('expand_frame_repr', False):

    print (df.assign(m = m0, 
                     s = s,
                     m2 = m2,
                     s2 = s.eq(2),
                     invm0 = ~m0,
                     first = (s.eq(2) & ~m0 & m2),
                     s1 = s.eq(1),
                     second = (s.eq(1) & m0),
                     both =(s.eq(2) & ~m0 & m2) | (s.eq(1) & m0)
           ))
  ID           A  B  C      m  s     m2     s2  invm0  first     s1  second   both
0  Z       apple  1  5   True  2   True   True  False  False  False   False  False
1  Z        pear  3  1  False  2   True   True   True   True  False   False   True
2  C       apple  1  8   True  1   True  False  False  False   True    True   True
3  E  strawberry  2  5  False  2  False   True   True  False  False   False  False
4  E        pear  5  1  False  2  False   True   True  False  False   False  False
5  D       apple  1  5   True  3   True  False  False  False  False   False  False
6  D        pear  3  1  False  3   True  False   True  False  False   False  False
7  D       melon  1  5  False  3   True  False   True  False  False   False  False

Delete items in pandas groupby groups based on values

Answers (1)

Related Questions