snoisia
snoisia

Reputation: 9

Pipeline does not apply the functions when I add a scaler

I am trying to deploy the model as a .pkl file. When making the pipeline, i am facing some problems. Here is the code that causes no trouble:

from sklearn.pipeline import FunctionTransformer, make_pipeline, Pipeline
from sklearn.compose import TransformedTargetRegressor, ColumnTransformer
from sklearn.preprocessing import OneHotEncoder


power_transformer = FunctionTransformer(lambda x: x.applymap(convert_power), validate=False)
mileage_transformer = FunctionTransformer(lambda x: x.applymap(convert_mileage), validate=False)
engine_transformer = FunctionTransformer(lambda x: x.applymap(convert_engine), validate=False)


power_pipeline = Pipeline([
    ('convert_power', power_transformer),
    ('impute', SimpleImputer(strategy='median'))
])


mileage_pipeline = Pipeline([
    ('convert_mileage', mileage_transformer),
    ('impute', SimpleImputer(strategy='median'))
])


engine_pipeline = Pipeline([
    ('convert_engine', engine_transformer),
    ('impute', SimpleImputer(strategy='median'))
])

seats_pipeline = Pipeline([
    ('imputer', seats_imputer)
])

cat_cols = ['Location', 'Fuel_Type', 'Transmission', 'Owner_Type']
# num_cols = ['Year', 'Kilometers_Driven', 'Mileage', 'Engine', 'Power', 'Seats'] scaler to be applied on
preprocessor = ColumnTransformer(
    transformers=[
        ('power', power_pipeline, ['Power']),
        ('mileage', mileage_pipeline, ['Mileage']),
        ('engine', engine_pipeline, ['Engine']),
        ('seats', seats_pipeline, ['Seats']),
        ('categorical', OneHotEncoder(handle_unknown='ignore', drop='first'), cat_cols),
    ],
    remainder='passthrough'  # Keep other columns as is
)


columns_to_drop = ['Unnamed: 0', 'Name', 'New_Price']

full_pipeline = Pipeline([
    ('dropping', FunctionTransformer(lambda df: pd.DataFrame(df.drop(columns=columns_to_drop)))),
    ('preprocessing', preprocessor)
])

transformed_data = full_pipeline.fit_transform(X_train_copy)

This code works fine. Then if I add a scaler inside the ColumnTransformer:

     ...
     transformers=[
        ('power', power_pipeline, ['Power']),
        ('mileage', mileage_pipeline, ['Mileage']),
        ('engine', engine_pipeline, ['Engine']),
        ('seats', seats_pipeline, ['Seats']),
        ('numerical', RobustScaler(), num_cols),
        ('categorical', OneHotEncoder(handle_unknown='ignore', drop='first'), cat_cols),
    ],
    ...
>ValueError: could not convert string to float: '26.6 km/kg'

Which, is not the case if i use the previous code. Without the scaler, all my values get converted to float and i get no nan values.

Here are the conversion functions that i have used:

def convert_power(value: str) -> float:
    if pd.isna(value) or not isinstance(value, str):
        return np.nan
    try:
        return float(value.split()[0])
    except ValueError:
        return np.nan


def convert_engine(value: str) -> float:
    if pd.isna(value) or not isinstance(value, str):
        return np.nan
    try:
        return float(value.split()[0])
    except ValueError:
        return np.nan


def convert_mileage(value: str) -> float:
    if pd.isna(value):
        return np.nan
    result = float(value.split()[0])
    if 'km/kg' in value:
        result *= 1.4
    return result

Head of the dataframe i am working with:

Unnamed: 0  Name    Location    Year    Kilometers_Driven   Fuel_Type   Transmission    Owner_Type  Mileage     Engine  Power   Seats   New_Price
0   0   Maruti Wagon R LXI CNG  Mumbai  2010    72000   CNG     Manual  First   26.6 km/kg  998 CC  58.16 bhp   5.0     NaN
1   1   Hyundai Creta 1.6 CRDi SX Option    Pune    2015    41000   Diesel  Manual  First   19.67 kmpl  1582 CC     126.2 bhp   5.0     NaN
2   2   Honda Jazz V    Chennai     2011    46000   Petrol  Manual  First   18.2 kmpl   1199 CC     88.7 bhp    5.0     8.61 Lakh
3   3   Maruti Ertiga VDI   Chennai     2012    87000   Diesel  Manual  First   20.77 kmpl  1248 CC     88.76 bhp   7.0     NaN
4   4   Audi A4 New 2.0 TDI Multitronic     Coimbatore  2013    40670   Diesel  Automatic   Second  15.2 kmpl   1968 CC     140.8 bhp   5.0     NaN

I tried adding a scaler to my pipeline which used to work fine, but i cannot get it to work with the scaler.

Upvotes: 0

Views: 75

Answers (0)

Related Questions