How can I get the count of sequential events pairs from a Pandas dataframe?

Question

I have a dataframe that looks like this:

ID  EVENT   DATE
1   1       142
1   5       167
1   3       245
2   1       54
2   5       87
3   3       165
3   2       178

And I would like to generate something like this:

EVENT_1  EVENT_2  COUNT
1        5        2
5        3        1
3        2        1

The idea is how many items (ID) go from one event to the next one. Don't care about previous states, I just want to consider the next state from the current state (e.g.: for ID 1, I don't want to count a transition from 1 to 3 because first, it goes to event 5 and then to 3). The date format is the number of days from a specific date (sort of like SAS format).

Is there a clean way to achieve this?

Scott Boston · Accepted Answer

Let's try this:

(df.groupby([df['EVENT'].rename('EVENT_1'), 
             df.groupby('ID')['EVENT'].shift(-1).rename('EVENT_2')])['ID']
   .count()).rename('COUNT').reset_index().astype(int)

Output:

|    |   EVENT_1 |   EVENT_2 |   COUNT |
|---:|----------:|----------:|--------:|
|  0 |         1 |         5 |       2 |
|  1 |         3 |         2 |       1 |
|  2 |         5 |         3 |       1 |

Details: Groupby on 'EVENT' and shifted 'EVENT' within each ID, then count.

How can I get the count of sequential events pairs from a Pandas dataframe?

Answers (2)

Related Questions