Reputation: 1857
I want to check whether the column app
grouping by the column user
contains a specific element, such as b
.
import pandas as pd
df=pd.DataFrame({'user':[1,1,1,2,2,3,3],'app':['a','b','c','a','c','b','c']})
Input:
app user
0 a 1
1 b 1
2 c 1
3 a 2
4 c 2
5 b 3
6 c 3
Expected:
app user contains_b
0 a 1 1
1 b 1 1
2 c 1 1
3 a 2 0
4 c 2 0
5 b 3 1
6 c 3 1
Upvotes: 1
Views: 76
Reputation: 59274
Using isin
df['contains_b'] = df.groupby('user').app.transform(lambda x: x.isin(['b']).any()).astype(int)
user app contains_b
0 1 a 1
1 1 b 1
2 1 c 1
3 2 a 0
4 2 c 0
5 3 b 1
6 3 c 1
Upvotes: 2
Reputation: 294488
transform
with any
df.assign(contains_b=df.app.eq('b').groupby(df.user).transform('any').astype(int))
app user contains_b
0 a 1 1
1 b 1 1
2 c 1 1
3 a 2 0
4 c 2 0
5 b 3 1
6 c 3 1
Upvotes: 4
Reputation: 863226
Use:
df['contains_b'] = df['user'].isin(df.loc[df['app'].eq('b'), 'user'].unique()).astype(int)
print (df)
user app contains_b
0 1 a 1
1 1 b 1
2 1 c 1
3 2 a 0
4 2 c 0
5 3 b 1
6 3 c 1
Details:
First filter by eq
(==)
column app
and get all user rows:
print (df.loc[df['app'].eq('b'), 'user'])
1 1
5 3
Name: user, dtype: int64
For better performance use unique
:
print (df.loc[df['app'].eq('b'), 'user'].unique())
[1 3]
Then test user
column for membership by isin
:
print (df['user'].isin(df.loc[df['app'].eq('b'), 'user'].unique()))
0 True
1 True
2 True
3 False
4 False
5 True
6 True
Name: user, dtype: bool
And last cast to integer True
s are 1
s and False
s - 0
.
Upvotes: 3