Reputation: 4315
My function call looks something like
loss = log_loss(y_true=validate_d['y'], y_pred=validate_probs, sample_weight=validate_df['weight'], normalize=True)
Is there any way to combine this with pandas rolling()
functionality, so I calculate it for a trailing 10k rows window, for example?
Upvotes: 1
Views: 54
Reputation: 1762
I couldn't find a very clean way to make rolling()
work on a multi-column dataframe, but here is the best I could do by using a custom window loss function that applies log_loss
import pandas as pd
import numpy as np
from sklearn.metrics import log_loss
# Everything in one dataframe, but you can have your pred in a separate one
# if you want
df = pd.DataFrame({
'y': [1, 0, 1, 1, 0, 1, 0, 1],
'y_pred': [0.7, 0.3, 0.8, 0.9, 0.4, 0.6, 0.2, 0.8],
'weight': [1.0, 1.5, 0.5, 1.0, 2.0, 1.0, 0.8, 1.2]
})
def weighted_log_loss(window):
# window is a series whose contents we're not interested in, we just want
# the range to `loc` from other data frames
y = df.loc[window.index, 'y']
y_pred = df.loc[window.index, 'y_pred']
weight = df.loc[window.index, 'weight']
return log_loss(
y_true=y,
y_pred=y_pred,
sample_weight=weight,
normalize=True
)
window_size = 3
print(df['y'].rolling(window=window_size).apply(weighted_log_loss))
Turns out there is a rolling_apply
function (source) which allows directly working with multi-column dataframes and this might suit you better.
Upvotes: 1