Reputation: 5202
In my code, I am trying to access the sample_weight
of the StandardScaler
. However, this StandardScaler
is within a Pipeline
which again is within a FeatureUnion
. I can't seem to get this parameter name correct: scaler_pipeline__scaler__sample_weight
which should be specified in the fit
method of the preprocessor object.
I get the following error: KeyError: 'scaler_pipeline
What should this parameter name be? Alternatively, if there is a generally better way to do this, feel free to propose it.
The code below is a standalone example.
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import StandardScaler
import pandas as pd
class ColumnSelector(BaseEstimator, TransformerMixin):
"""Select only specified columns."""
def __init__(self, columns):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.columns]
def set_output(self, *, transform=None):
return self
df = pd.DataFrame({'ds':[1,2,3,4],'y':[1,2,3,4],'a':[1,2,3,4],'b':[1,2,3,4],'c':[1,2,3,4]})
sample_weight=[0,1,1,1]
scaler_pipeline = Pipeline(
[
(
"selector",
ColumnSelector(['a','b']),
),
("scaler", StandardScaler()),
]
)
remaining_pipeline = Pipeline([("selector", ColumnSelector(["ds","y"]))])
# Featureunion fitting training data
preprocessor = FeatureUnion(
transformer_list=[
("scaler_pipeline", scaler_pipeline),
("remaining_pipeline", remaining_pipeline),
]
).set_output(transform="pandas")
df_training_transformed = preprocessor.fit_transform(
df, scaler_pipeline__scaler__sample_weight=sample_weight
)
Upvotes: 3
Views: 152
Reputation: 13518
fit_transform has no parameter called scaler_pipeline__scaler__sample_weight
.
Instead, it is expecting to receive "parameters passed to the fit method of each step" as a dict of string, "where each parameter name is prefixed such that parameter p for step s has key s__p".
So, in your example, it should be:
df_training_transformed = preprocessor.fit_transform(
df, {"scaler_pipeline__scaler__sample_weight":sample_weight}
)
Upvotes: 1