rubjim
rubjim

Reputation: 13

How can I get the count of sequential events pairs from a Pandas dataframe?

I have a dataframe that looks like this:

ID  EVENT   DATE
1   1       142
1   5       167
1   3       245
2   1       54
2   5       87
3   3       165
3   2       178

And I would like to generate something like this:

EVENT_1  EVENT_2  COUNT
1        5        2
5        3        1
3        2        1

The idea is how many items (ID) go from one event to the next one. Don't care about previous states, I just want to consider the next state from the current state (e.g.: for ID 1, I don't want to count a transition from 1 to 3 because first, it goes to event 5 and then to 3). The date format is the number of days from a specific date (sort of like SAS format).

Is there a clean way to achieve this?

Upvotes: 1

Views: 180

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Let's try this:

(df.groupby([df['EVENT'].rename('EVENT_1'), 
             df.groupby('ID')['EVENT'].shift(-1).rename('EVENT_2')])['ID']
   .count()).rename('COUNT').reset_index().astype(int)

Output:

|    |   EVENT_1 |   EVENT_2 |   COUNT |
|---:|----------:|----------:|--------:|
|  0 |         1 |         5 |       2 |
|  1 |         3 |         2 |       1 |
|  2 |         5 |         3 |       1 |

Details: Groupby on 'EVENT' and shifted 'EVENT' within each ID, then count.

Upvotes: 1

Chris Adams
Chris Adams

Reputation: 18647

You could use groupby and shift. We'll also use rename_axis and reset_index to tidy up the final output:

(pd.concat([f.groupby([f['EVENT'], f['EVENT'].shift(-1).astype('Int64')]).size()
           for _, f in df.groupby('ID')])
 .groupby(level=[0, 1]).sum()
 .rename_axis(['EVENT_1', 'EVENT_2']).reset_index(name='COUNT'))

[out]

   EVENT_1  EVENT_2  COUNT
0        1        5      2
1        3        2      1
2        5        3      1

Upvotes: 1

Related Questions