Reputation: 614
I have a multi-index dataframe and I am trying to count the consecutive winners
The problem is there are some 'NaN' values interspersed within the column values, that I would like to skip when trying to count consecutive winners
week_1 week_2 week_3 week_4 week_5 week_6 \
Year
2000 Arizona Cardinals loser winner loser loser winner loser
Atlanta Falcons winner loser winner loser loser loser
Baltimore Ravens winner NaN winner winner winner winner
Buffalo Bills NaN winner loser loser loser winner
Carolina Panthers loser winner loser loser winner loser
I can use df3 = df.shift(-1, axis =1).isin(['winner'])
to make the comparisons, but this is not going to skip the NaN
values.
So something like this:
Baltimore Ravens winner NaN winner
which should count for as consecutive values will be skipped.
Upvotes: 0
Views: 105
Reputation: 7361
I tried to figure out a vectorized solution, but didn't manage.
This may be easily solved by a simple python loop over each row:
def find_wins(x):
mw = 0
c = 0
for e in x.dropna():
c = c + 1 if e == 'winner' else 0
mw = max(mw, c)
return mw
res = df.apply(find_wins, axis=1)
with df
your original dataframe, this returns the following res
Series
:
year
2000 Arizona Cardinals 1
Atlanta Falcons 1
Baltimore Ravens 5
Buffalo Bills 1
Carolina Panthers 1
dtype: int64
where each element is maximum numbers of consecutive wins (nan skipped).
The point here is just do use x.dropna()
to drop the nan
values before looping on each row and count the consecutive 'winner'
.
Upvotes: 1
Reputation: 10590
In order to drop your NaN
values and shift values, you can use apply
along axis 1 and dropna
. You have to do a little bit of finagling though to shift the values:
no_bye = df.apply(lambda x: x.dropna().reset_index(drop=True), axis=1)
no_bye.columns = ['game_' + str(n+1) for n in range(16)]
Upvotes: 1