MasayoMusic
MasayoMusic

Reputation: 614

Skipping Nan values when counting consecutive values?

I have a multi-index dataframe and I am trying to count the consecutive winners The problem is there are some 'NaN' values interspersed within the column values, that I would like to skip when trying to count consecutive winners

                   week_1  week_2  week_3  week_4  week_5  week_6  \
Year                                                                     
2000 Arizona Cardinals   loser  winner   loser   loser  winner   loser   
     Atlanta Falcons     winner  loser  winner   loser   loser   loser   
     Baltimore Ravens    winner  NaN   winner  winner  winner  winner   
     Buffalo Bills       NaN     winner   loser   loser   loser  winner   
     Carolina Panthers   loser  winner   loser   loser  winner   loser 

I can use df3 = df.shift(-1, axis =1).isin(['winner']) to make the comparisons, but this is not going to skip the NaN values.

So something like this:

Baltimore Ravens    winner  NaN   winner

which should count for as consecutive values will be skipped.

Upvotes: 0

Views: 105

Answers (2)

Valentino
Valentino

Reputation: 7361

I tried to figure out a vectorized solution, but didn't manage.
This may be easily solved by a simple python loop over each row:

def find_wins(x):
    mw = 0
    c = 0
    for e in x.dropna():
        c = c + 1 if e == 'winner' else 0
        mw = max(mw, c)
    return mw

res = df.apply(find_wins, axis=1)

with df your original dataframe, this returns the following res Series:

year             
2000  Arizona Cardinals    1
      Atlanta Falcons      1
      Baltimore Ravens     5
      Buffalo Bills        1
      Carolina Panthers    1
dtype: int64

where each element is maximum numbers of consecutive wins (nan skipped).

The point here is just do use x.dropna() to drop the nan values before looping on each row and count the consecutive 'winner'.

Upvotes: 1

busybear
busybear

Reputation: 10590

In order to drop your NaN values and shift values, you can use apply along axis 1 and dropna. You have to do a little bit of finagling though to shift the values:

no_bye = df.apply(lambda x: x.dropna().reset_index(drop=True), axis=1)
no_bye.columns = ['game_' + str(n+1) for n in range(16)]

Upvotes: 1

Related Questions