mgadfly
mgadfly

Reputation: 57

Fill column in pandas based on prior row data until change

I have a data frame:

df = pd.DataFrame({'player':['John Adams', 'Mark Capone', 'Cecil Milton', 'Hector James', 'Hector James', 'Luke Valentine', 'Luke Valentine'], 'action':['Starts at PG', 'Dribbles', 'Passes', 'receives pass', 'Travels', 'Subs in at PG', 'Passes']})

The first column is the player. The second column is the action the player takes.

I want to create a third column that tracks who is in at PG. I add the column:

df['PG'] = " "

I then write the following to populate the PG column with the name of the player:

df.loc[(df.action == 'Starts at PG'), 'PG'] = df['player']

df.loc[(df.action == 'Subs in at PG'), 'PG'] = df['player']

The issue I cannot figure out is how to forward fill the PG column until it is changed at row 5, and then fill with the new value from 5 to the end. I've used ffill on numeric columns before, but this is different because it is a string I'm working with. Any help is greatly appreciated.

To be clear, I'm trying to get "John Adams" in the PG column for rows 0 through 4 and "Luke Valentine" for rows 5 and 6.

enter image description here

Upvotes: 0

Views: 1542

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

Try ffill, which means forward fill all nan values:

 df['PG'] = df.player.where(df.action.str.contains('PG')).ffill()

Output:

           player         action              PG
0      John Adams   Starts at PG      John Adams
1     Mark Capone       Dribbles      John Adams
2    Cecil Milton         Passes      John Adams
3    Hector James  receives pass      John Adams
4    Hector James        Travels      John Adams
5  Luke Valentine  Subs in at PG  Luke Valentine
6  Luke Valentine         Passes  Luke Valentine

Upvotes: 1

Related Questions