Reputation:
i Have a dataframe with two columns. Sample of the df is attached below.
id Content
21 John
25 kevin
26 john
28 levis
21 john
21 carls
25 kevin
29 john
25 carls
25 john
29 kevin
21 carls
21 carls
25 john
What I want to do is groupby using id and set their occurrences and if it contains john and carls combination, assied 1 for that particular id.
Id Content
21 John
John
carls
carls
carls
25 kevin
kevin
carls
John
John
26 john
28 levis
29 john
kevin
Final required dataframe:
Id Result
21 1
25 1
26 0
28 0
29 0
I did the groupby part but don't know how to get the final required output df.
Really appreciate your help !!!!!
Upvotes: 0
Views: 57
Reputation: 34046
Use df.Groupby
with Series.unique
and Series.apply
:
In [2511]: x = df.groupby('id').Content.unique().reset_index()
In [2514]: x['Result'] = x.Content.apply(lambda x: 1 if 'john' in x and 'carls' in x else 0)
In [2516]: x.drop('Content', 1, inplace=True)
In [2517]: x
Out[2517]:
id Result
0 21 1
1 25 1
2 26 0
3 28 0
4 29 0
EDIT: For checking the order
as well, you can do this:
In [2539]: x['Result'] = x.Content.apply(lambda x: 1 if ('john', 'carls') in zip(x, x[1:]) else 0)
In [2541]: x.drop('Content', 1, inplace=True)
In [2542]: x
Out[2542]:
id Result
0 21 1
1 25 0
2 26 0
3 28 0
4 29 0
Upvotes: 0
Reputation: 75080
You can try using set.issubset
with groupby
here to check membership of both john
and carl
in each group :
f = lambda x: {"john","carls"}.issubset(set(x.str.lower()))
out = df.groupby("id")["Content"].agg(f).rename("result").astype(int).reset_index()
print(out)
id result
0 21 1
1 25 1
2 26 0
3 28 0
4 29 0
EDIT for question asked in comments, first sort with id ,then you can map with a dictionary and then check condition whether a value is lesser then the next row present in any of the group, then using the condition we had previously we can se an &
condition:
df = df.sort_values("id")
d = {"john":1,"carls":2}
s = df['Content'].str.lower().map(d)
c1 = s.shift().lt(s).groupby(df['id']).any()
f = lambda x: set(d.keys()).issubset(set(x.str.lower()))
out = ((df.groupby("id")["Content"].agg(f) & c1).astype(int)
.rename("result").reset_index())
print(out)
id result
0 21 1
1 25 0
2 26 0
3 28 0
4 29 0
Upvotes: 2