Reputation: 129
I'm new to Python and Pandas. I have the following DataFrame:
import pandas as pd
df = pd.DataFrame( {'a':['A','A','B','B','B','C','C','C'], 'b':[1,3,1,2,3,1,3,3]})
a b
0 A 1
1 A 3
2 B 1
3 B 2
4 B 3
5 C 1
6 C 3
7 C 3
I would like to create a new DataFrame in which only groups from column A that have the values 1 and 2 in column b show up, that is:
a b
0 B 1
1 B 2
2 B 3
I know we can create groups using df.groupby('a'), and the method df.all() seems to be related to this, but I can't figure it out by myself. It seems like it should be straightforward. Any help?
Upvotes: 0
Views: 116
Reputation: 30930
Use GroupBy.filter
+ Series.any
:
new_df=df.groupby('a').filter(lambda x: x.b.eq(2).any() & x.b.eq(1).any())
print(new_df)
a b
2 B 1
3 B 2
4 B 3
We could also use:
new_df=df[df.groupby('a').transform(lambda x: x.eq(1).any() & x.eq(2).any()).b]
print(new_df)
a b
2 B 1
3 B 2
4 B 3
Upvotes: 2
Reputation: 150785
Another approach:
s = (pd.DataFrame(df['b'].values == np.array([[1],[2]])).T
.groupby(df['a'])
.transform('any')
.all(1)
)
df[s]
Output:
a b
2 B 1
3 B 2
4 B 3
Upvotes: 1