renakre
renakre

Reputation: 8291

How to identify a specific occurrence across two rows and calculate the count

Let's say I have these 2 pandas dataframes:

id | userid | type 
1  | 20     | a  
2  | 20     | a
3  | 20     | b
4  | 21     | a  
5  | 21     | b
6  | 21     | a
7  | 21     | b
8  | 21     | b

I want to obtain the number of times 'b follows a' for each user, and obtain a new dataframe like this:

userid | b_follows_a
20     | 1
21     | 2

I know I can do this using for loop. However, I wonder if there is a more elegant solution to this.

Upvotes: 2

Views: 34

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

Creative solution:

In [49]: df.groupby('userid')['type'].sum().str.count('ab').reset_index()
Out[49]:
   userid  type
0      20     1
1      21     2

Explanation:

In [50]: df.groupby('userid')['type'].sum()
Out[50]:
userid
20      aab
21    ababb
Name: type, dtype: object

Upvotes: 2

akuiper
akuiper

Reputation: 215057

You can use shift() to check if a is followed by b with vectorized & and then count the trues with a sum:

df.groupby('userid').type.apply(lambda x: ((x == "a") & (x.shift(-1) == "b")).sum()).reset_index()

#userid type
#0   20    1
#1   21    2

Upvotes: 2

Related Questions