izhako
izhako

Reputation: 151

Number winning streak ID's in pandas

I have a Python pandas dataframe with winning streaks for some teams over several time periods and I would like to identfy the streaks chronologically. So, what I have is:

import pandas as pd
data = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
    'team_id':       ['A']*6 + ['B']*5,
    'win':           [1,1,1,0,1,1,1,0,0,1,1],
    'streak_length': [1,2,3,0,1,2,1,0,0,1,2]})
print(data)

And what I would like to have is:

result = pd.DataFrame({'period': list(range(1,7))+list(range(1,6)),
    'team_id':       ['A']*6 + ['B']*5,
    'win':           [1,1,1,0,1,1,1,0,0,1,1],
    'streak_length': [1,2,3,0,1,2,1,0,0,1,2],
    'streak_id':     [1,1,1,None,2,2,1,None,None,2,2]})
print(result)

I tried to groupby by team_id and sum over streak length, but it can be repeated, so I think this would not work. Any help appreciated!

Upvotes: 4

Views: 246

Answers (1)

jezrael
jezrael

Reputation: 863291

Create consecutive groups by Series.shift Series.ne and Series.cumsum, filter only 1 in win and use GroupBy.transform with factorize in lambda function:

m = data['win'].eq(1)
g = data['win'].ne(data['win'].shift()).cumsum()

data['streak_id'] = g[m].groupby(data['team_id']).transform(
    lambda x: pd.factorize(x)[0] + 1
)

print (data)
    period team_id  win  streak_length  streak_id
0        1       A    1              1        1.0
1        2       A    1              2        1.0
2        3       A    1              3        1.0
3        4       A    0              0        NaN
4        5       A    1              1        2.0
5        6       A    1              2        2.0
6        1       B    1              1        1.0
7        2       B    0              0        NaN
8        3       B    0              0        NaN
9        4       B    1              1        2.0
10       5       B    1              2        2.0

Upvotes: 6

Related Questions