Baron Yugovich
Baron Yugovich

Reputation: 4315

Apply sklearn logloss with rolling on pandas dataframe

My function call looks something like

loss = log_loss(y_true=validate_d['y'], y_pred=validate_probs, sample_weight=validate_df['weight'],  normalize=True)

Is there any way to combine this with pandas rolling() functionality, so I calculate it for a trailing 10k rows window, for example?

Upvotes: 1

Views: 54

Answers (1)

Sachin Hosmani
Sachin Hosmani

Reputation: 1762

I couldn't find a very clean way to make rolling() work on a multi-column dataframe, but here is the best I could do by using a custom window loss function that applies log_loss


import pandas as pd
import numpy as np
from sklearn.metrics import log_loss

# Everything in one dataframe, but you can have your pred in a separate one
# if you want
df = pd.DataFrame({
    'y': [1, 0, 1, 1, 0, 1, 0, 1],
    'y_pred': [0.7, 0.3, 0.8, 0.9, 0.4, 0.6, 0.2, 0.8],
    'weight': [1.0, 1.5, 0.5, 1.0, 2.0, 1.0, 0.8, 1.2]
})

def weighted_log_loss(window):
    # window is a series whose contents we're not interested in, we just want
    # the range to `loc` from other data frames
    y = df.loc[window.index, 'y']
    y_pred = df.loc[window.index, 'y_pred']
    weight = df.loc[window.index, 'weight']
    return log_loss(
        y_true=y,
        y_pred=y_pred,
        sample_weight=weight,
        normalize=True
    )

window_size = 3
print(df['y'].rolling(window=window_size).apply(weighted_log_loss))


Turns out there is a rolling_apply function (source) which allows directly working with multi-column dataframes and this might suit you better.

Upvotes: 1

Related Questions