Reputation: 57
I have a data frame:
df = pd.DataFrame({'player':['John Adams', 'Mark Capone', 'Cecil Milton', 'Hector James', 'Hector James', 'Luke Valentine', 'Luke Valentine'], 'action':['Starts at PG', 'Dribbles', 'Passes', 'receives pass', 'Travels', 'Subs in at PG', 'Passes']})
The first column is the player. The second column is the action the player takes.
I want to create a third column that tracks who is in at PG. I add the column:
df['PG'] = " "
I then write the following to populate the PG column with the name of the player:
df.loc[(df.action == 'Starts at PG'), 'PG'] = df['player']
df.loc[(df.action == 'Subs in at PG'), 'PG'] = df['player']
The issue I cannot figure out is how to forward fill the PG column until it is changed at row 5, and then fill with the new value from 5 to the end. I've used ffill on numeric columns before, but this is different because it is a string I'm working with. Any help is greatly appreciated.
To be clear, I'm trying to get "John Adams" in the PG column for rows 0 through 4 and "Luke Valentine" for rows 5 and 6.
Upvotes: 0
Views: 1542
Reputation: 150735
Try ffill
, which means forward fill
all nan
values:
df['PG'] = df.player.where(df.action.str.contains('PG')).ffill()
Output:
player action PG
0 John Adams Starts at PG John Adams
1 Mark Capone Dribbles John Adams
2 Cecil Milton Passes John Adams
3 Hector James receives pass John Adams
4 Hector James Travels John Adams
5 Luke Valentine Subs in at PG Luke Valentine
6 Luke Valentine Passes Luke Valentine
Upvotes: 1