iordanoffff
iordanoffff

Reputation: 3

Pandas iterative imputation without using 'for' loop

Given the following dataset, I must insert the last winner of each tennis confrontation if there is one.

player_1 player_2 winner last_winner
Federer Nadal Federer
Djoko Federer Djoko
Nadal Federer Nadal
Djoko Federer Federer
Murray Djoko Murray
Djoko Federer Djoko

Then it should look like this :

player_1 player_2 winner last_winner
Federer Nadal Federer none
Djoko Federer Djoko none
Nadal Federer Nadal Federer
Djoko Federer Federer Djoko
Murray Djoko Murray none
Djoko Federer Djoko Federer

This is the code I use with a for loop :

for i,j in tennis.iterrows() :
    
    try :
        tennis.loc[i, 'last_winner'] = tennis.iloc[:i].loc[(tennis.player_1.isin([j.player_1, j.player_2]) & tennis.player_2.isin([j.player_1, j.player_2])), "winner"].iloc[-1]
    except:
        tennis.loc[i, 'last_winner'] = "none"

Is there any mean to perform the same operation without any for loop, using something like an apply function ?

Upvotes: 0

Views: 30

Answers (1)

Rawson
Rawson

Reputation: 2787

In a single line of code:

df["last_winner"] = df.groupby(
    pd.Series(
        df[["player_1", "player_2"]].values.tolist()) \
        .apply(sorted).astype(str)) \
    ["winner"].shift()

This can be split up into sections:

  1. Create a list for each row of players 1 and 2 (as a pd.Series)
  2. Sort this list so that all rows will match (i.e. doesn't matter if players are 1 or 2 always)
  3. Turn the lists into strings for the groupby
  4. Groupby this series
  5. Take the "winner" column for each group and shift.
  6. Make this a new column of the DataFrame.

Upvotes: 1

Related Questions