Conditional backfill of Pandas columns, another variation

Question

I have the following Dataframe:

Track	FGrating	HorseId	Last FGrating at Happy Valley Grass
Happy Valley grass	97	22609
Happy Valley grass	106	22609	97
Happy Valley grass	104	22609	106
Happy Valley grass	102	22609	104
Happy Valley grass	95	22609	102
Sha Tin grass	108	22609
Sha Tin grass	104	22609
Happy Valley grass	107	22609	95
Sha Tin grass	102	22609
Happy Valley grass	108	22609	107

I need to fill the empty cells of the rightmost column according to these two rules:

If the horse didn't race on the particular track yet (Happy Valley grass, in this example), then the value to be filled is 0;
Between two races at the particular track (Happy Valley grass, in this example), the value to be filled is the last FGrating on the track in question (the two consecutive rows with Sha Tin grass will get the value 95 and the third one will get 107).

The end result will be like this:

Track	FGrating	HorseId	Last FGrating at Happy Valley Grass
Happy Valley grass	97	22609	0 (rule 1)
Happy Valley grass	106	22609	97
Happy Valley grass	104	22609	106
Happy Valley grass	102	22609	104
Happy Valley grass	95	22609	102
Sha Tin grass	108	22609	95 (rule 2)
Sha Tin grass	104	22609	95 (rule 2)
Happy Valley grass	107	22609	95
Sha Tin grass	102	22609	107 (rule 2)
Happy Valley grass	108	22609	107

I need this for every HorseId on the Dataframe

I tried doing a backfill then filling with 0, something like this:

df['Last FGrating at Happy Valley Grass'] = df.groupby('HorseId')['Last FGrating at Happy Valley Grass'].apply(
        lambda x: x.fillna(method='bfill').fillna(0))

but it backfills everything, ignoring rule 1.

Can you help me here? Doing some research here, I have the feeling that it is a bit more complicated than a string of fillna.

Timus · Accepted Answer

Following up on the information from the comments, I'd propose something like:

def fill(sdf, track):
    mask = sdf['Track'].eq(track).cummax()
    if mask.any():
        first = mask[mask].index[0]
        if pd.isna(sdf.at[first, 'Last FGrating at Happy Valley Grass']):
            sdf.loc[~mask, 'Last FGrating at Happy Valley Grass'] = 0
            sdf.at[first, 'Last FGrating at Happy Valley Grass'] = 0 
    sdf['Last FGrating at Happy Valley Grass'] = (
        sdf['Last FGrating at Happy Valley Grass'].fillna(method='bfill')
    )
    return sdf

track = "Sha Tin grass"
df[track] = (
    df.groupby('HorseId').apply(fill, track)['Last FGrating at Happy Valley Grass']
)

Since the operation depends on the track, I've made it an argument for the fill function. I'd also put the result into a new column (here named after the chosen track) to not override the base column.

Variation, which I like better (using numerical access):

def fill(sdf, track):
    col = sdf.columns.to_list().index('Last FGrating at Happy Valley Grass')
    mask = sdf['Track'].eq(track)
    if mask.any():
        first = mask.values.argmax()
        if pd.isna(sdf.iat[first, col]):
            sdf.iloc[:first + 1, col] = 0
    sdf.iloc[:, col] = sdf.iloc[:, col].fillna(method='bfill')
    return sdf

EDIT: Removed the first insufficient attempts, made slight adjustments to current answer, and added a variation.

Conditional backfill of Pandas columns, another variation

Answers (1)

Related Questions