user13419531
user13419531

Reputation:

Group by with conditions python

i Have a dataframe with two columns. Sample of the df is attached below.

id  Content
21  John
25  kevin
26  john
28  levis
21  john
21  carls
25  kevin
29  john
25  carls
25  john
29  kevin
21  carls
21  carls
25  john

What I want to do is groupby using id and set their occurrences and if it contains john and carls combination, assied 1 for that particular id.

Id  Content
21  John
    John
    carls
    carls
    carls
25  kevin
    kevin
    carls
    John
    John
26  john
28  levis
29  john
    kevin

Final required dataframe:

Id  Result
21  1
25  1
26  0
28  0
29  0

I did the groupby part but don't know how to get the final required output df.

Really appreciate your help !!!!!

Upvotes: 0

Views: 57

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34046

Use df.Groupby with Series.unique and Series.apply:

In [2511]: x = df.groupby('id').Content.unique().reset_index()
In [2514]: x['Result'] = x.Content.apply(lambda x: 1 if 'john' in x and 'carls' in x else 0)

In [2516]: x.drop('Content', 1, inplace=True)

In [2517]: x
Out[2517]: 
   id  Result
0  21       1
1  25       1
2  26       0
3  28       0
4  29       0

EDIT: For checking the order as well, you can do this:

In [2539]: x['Result'] = x.Content.apply(lambda x: 1 if ('john', 'carls') in zip(x, x[1:]) else 0)
In [2541]: x.drop('Content', 1, inplace=True)

In [2542]: x
Out[2542]: 
   id  Result
0  21       1
1  25       0
2  26       0
3  28       0
4  29       0

Upvotes: 0

anky
anky

Reputation: 75080

You can try using set.issubset with groupby here to check membership of both john and carl in each group :

f = lambda x: {"john","carls"}.issubset(set(x.str.lower()))
out = df.groupby("id")["Content"].agg(f).rename("result").astype(int).reset_index()

print(out)

   id  result
0  21       1
1  25       1
2  26       0
3  28       0
4  29       0

EDIT for question asked in comments, first sort with id ,then you can map with a dictionary and then check condition whether a value is lesser then the next row present in any of the group, then using the condition we had previously we can se an & condition:

df = df.sort_values("id")
d = {"john":1,"carls":2}
s = df['Content'].str.lower().map(d)
c1 = s.shift().lt(s).groupby(df['id']).any()

f = lambda x: set(d.keys()).issubset(set(x.str.lower()))
out = ((df.groupby("id")["Content"].agg(f) & c1).astype(int)
        .rename("result").reset_index())

print(out)

   id  result
0  21       1
1  25       0
2  26       0
3  28       0
4  29       0

Upvotes: 2

Related Questions