Reputation: 2416
I have the following Dataframe:
Track | FGrating | HorseId | Last FGrating at Happy Valley Grass |
---|---|---|---|
Happy Valley grass | 97 | 22609 | |
Happy Valley grass | 106 | 22609 | 97 |
Happy Valley grass | 104 | 22609 | 106 |
Happy Valley grass | 102 | 22609 | 104 |
Happy Valley grass | 95 | 22609 | 102 |
Sha Tin grass | 108 | 22609 | |
Sha Tin grass | 104 | 22609 | |
Happy Valley grass | 107 | 22609 | 95 |
Sha Tin grass | 102 | 22609 | |
Happy Valley grass | 108 | 22609 | 107 |
I need to fill the empty cells of the rightmost column according to these two rules:
The end result will be like this:
Track | FGrating | HorseId | Last FGrating at Happy Valley Grass |
---|---|---|---|
Happy Valley grass | 97 | 22609 | 0 (rule 1) |
Happy Valley grass | 106 | 22609 | 97 |
Happy Valley grass | 104 | 22609 | 106 |
Happy Valley grass | 102 | 22609 | 104 |
Happy Valley grass | 95 | 22609 | 102 |
Sha Tin grass | 108 | 22609 | 95 (rule 2) |
Sha Tin grass | 104 | 22609 | 95 (rule 2) |
Happy Valley grass | 107 | 22609 | 95 |
Sha Tin grass | 102 | 22609 | 107 (rule 2) |
Happy Valley grass | 108 | 22609 | 107 |
I need this for every HorseId
on the Dataframe
I tried doing a backfill then filling with 0, something like this:
df['Last FGrating at Happy Valley Grass'] = df.groupby('HorseId')['Last FGrating at Happy Valley Grass'].apply(
lambda x: x.fillna(method='bfill').fillna(0))
but it backfills everything, ignoring rule 1.
Can you help me here? Doing some research here, I have the feeling that it is a bit more complicated than a string of fillna
.
Upvotes: 2
Views: 176
Reputation: 11321
Following up on the information from the comments, I'd propose something like:
def fill(sdf, track):
mask = sdf['Track'].eq(track).cummax()
if mask.any():
first = mask[mask].index[0]
if pd.isna(sdf.at[first, 'Last FGrating at Happy Valley Grass']):
sdf.loc[~mask, 'Last FGrating at Happy Valley Grass'] = 0
sdf.at[first, 'Last FGrating at Happy Valley Grass'] = 0
sdf['Last FGrating at Happy Valley Grass'] = (
sdf['Last FGrating at Happy Valley Grass'].fillna(method='bfill')
)
return sdf
track = "Sha Tin grass"
df[track] = (
df.groupby('HorseId').apply(fill, track)['Last FGrating at Happy Valley Grass']
)
Since the operation depends on the track
, I've made it an argument for the fill
function. I'd also put the result into a new column (here named after the chosen track
) to not override the base column.
Variation, which I like better (using numerical access):
def fill(sdf, track):
col = sdf.columns.to_list().index('Last FGrating at Happy Valley Grass')
mask = sdf['Track'].eq(track)
if mask.any():
first = mask.values.argmax()
if pd.isna(sdf.iat[first, col]):
sdf.iloc[:first + 1, col] = 0
sdf.iloc[:, col] = sdf.iloc[:, col].fillna(method='bfill')
return sdf
EDIT: Removed the first insufficient attempts, made slight adjustments to current answer, and added a variation.
Upvotes: 3