get groups that contain all needed values

Question

df = pd.DataFrame({'A' : ['bar', 'bar', 'bar', 'foo',
                          'foo', 'foo'],
                    'B' : [1, 2, 3, 4, 5, 6],
                  'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> df
     A  B    C
0  bar  1  2.0
1  bar  2  5.0
2  bar  3  8.0
3  foo  4  1.0
4  foo  5  2.0
5  foo  6  9.0

How can I get the groups with both neededVals = [1.0,2.0] in C if I groupby('A'):

3  foo  4  1.0
4  foo  5  2.0
5  foo  6  9.0

And just those values as well:

3  foo  4  1.0
4  foo  5  2.0

jezrael · Accepted Answer

I think need compare sets with GroupBy.transform and filter by boolean indexing:

neededVals = [1.0,2.0] 
df = df[df.groupby('A')['C'].transform(lambda x: set(x) >= set(neededVals))]
print (df)
     A  B    C
3  foo  4  1.0
4  foo  5  2.0
5  foo  6  9.0

Detail:

print (df.groupby('A')['C'].transform(lambda x: set(x) >= set(neededVals)))
0    False
1    False
2    False
3     True
4     True
5     True
Name: C, dtype: bool

And for second first filter out unnecessary rows by isin and then compare equality:

df = df[df['C'].isin(neededVals)]
df = df[df.groupby('A')['C'].transform(lambda x: set(x) == set(neededVals))]
print (df)
     A  B    C
3  foo  4  1.0
4  foo  5  2.0

get groups that contain all needed values

Answers (1)

Related Questions