Reputation: 9
I am trying to deploy the model as a .pkl file. When making the pipeline, i am facing some problems. Here is the code that causes no trouble:
from sklearn.pipeline import FunctionTransformer, make_pipeline, Pipeline
from sklearn.compose import TransformedTargetRegressor, ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
power_transformer = FunctionTransformer(lambda x: x.applymap(convert_power), validate=False)
mileage_transformer = FunctionTransformer(lambda x: x.applymap(convert_mileage), validate=False)
engine_transformer = FunctionTransformer(lambda x: x.applymap(convert_engine), validate=False)
power_pipeline = Pipeline([
('convert_power', power_transformer),
('impute', SimpleImputer(strategy='median'))
])
mileage_pipeline = Pipeline([
('convert_mileage', mileage_transformer),
('impute', SimpleImputer(strategy='median'))
])
engine_pipeline = Pipeline([
('convert_engine', engine_transformer),
('impute', SimpleImputer(strategy='median'))
])
seats_pipeline = Pipeline([
('imputer', seats_imputer)
])
cat_cols = ['Location', 'Fuel_Type', 'Transmission', 'Owner_Type']
# num_cols = ['Year', 'Kilometers_Driven', 'Mileage', 'Engine', 'Power', 'Seats'] scaler to be applied on
preprocessor = ColumnTransformer(
transformers=[
('power', power_pipeline, ['Power']),
('mileage', mileage_pipeline, ['Mileage']),
('engine', engine_pipeline, ['Engine']),
('seats', seats_pipeline, ['Seats']),
('categorical', OneHotEncoder(handle_unknown='ignore', drop='first'), cat_cols),
],
remainder='passthrough' # Keep other columns as is
)
columns_to_drop = ['Unnamed: 0', 'Name', 'New_Price']
full_pipeline = Pipeline([
('dropping', FunctionTransformer(lambda df: pd.DataFrame(df.drop(columns=columns_to_drop)))),
('preprocessing', preprocessor)
])
transformed_data = full_pipeline.fit_transform(X_train_copy)
This code works fine. Then if I add a scaler inside the ColumnTransformer:
...
transformers=[
('power', power_pipeline, ['Power']),
('mileage', mileage_pipeline, ['Mileage']),
('engine', engine_pipeline, ['Engine']),
('seats', seats_pipeline, ['Seats']),
('numerical', RobustScaler(), num_cols),
('categorical', OneHotEncoder(handle_unknown='ignore', drop='first'), cat_cols),
],
...
>ValueError: could not convert string to float: '26.6 km/kg'
Which, is not the case if i use the previous code. Without the scaler, all my values get converted to float and i get no nan values.
Here are the conversion functions that i have used:
def convert_power(value: str) -> float:
if pd.isna(value) or not isinstance(value, str):
return np.nan
try:
return float(value.split()[0])
except ValueError:
return np.nan
def convert_engine(value: str) -> float:
if pd.isna(value) or not isinstance(value, str):
return np.nan
try:
return float(value.split()[0])
except ValueError:
return np.nan
def convert_mileage(value: str) -> float:
if pd.isna(value):
return np.nan
result = float(value.split()[0])
if 'km/kg' in value:
result *= 1.4
return result
Head of the dataframe i am working with:
Unnamed: 0 Name Location Year Kilometers_Driven Fuel_Type Transmission Owner_Type Mileage Engine Power Seats New_Price
0 0 Maruti Wagon R LXI CNG Mumbai 2010 72000 CNG Manual First 26.6 km/kg 998 CC 58.16 bhp 5.0 NaN
1 1 Hyundai Creta 1.6 CRDi SX Option Pune 2015 41000 Diesel Manual First 19.67 kmpl 1582 CC 126.2 bhp 5.0 NaN
2 2 Honda Jazz V Chennai 2011 46000 Petrol Manual First 18.2 kmpl 1199 CC 88.7 bhp 5.0 8.61 Lakh
3 3 Maruti Ertiga VDI Chennai 2012 87000 Diesel Manual First 20.77 kmpl 1248 CC 88.76 bhp 7.0 NaN
4 4 Audi A4 New 2.0 TDI Multitronic Coimbatore 2013 40670 Diesel Automatic Second 15.2 kmpl 1968 CC 140.8 bhp 5.0 NaN
I tried adding a scaler to my pipeline which used to work fine, but i cannot get it to work with the scaler.
Upvotes: 0
Views: 75