leandro.starke
leandro.starke

Reputation: 11

The features selected by SelectKBest do not match those transformed by ColumnTransformer

I am in the process of deploying a machine learning model for study purposes and I have some questions about it:

  1. My POST method will send to the API my original features (without transformations applied)

untransformed data

  1. I'm using the same pipeline from the training fase and getting from it the ColumnTranformed and the best model:
preprocessor = pipeline.named_steps["columntransformer"]
model = pipeline.named_steps["xgbclassifier"]

pipeline

  1. Inside the API I'm getting the POSTed data and wanted to transformed it with the same preprocessor used in the pipeline but:
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-29-f928ce436ece> in <cell line: 15>()
     13 
     14 # preprocessor.fit(df[["tenure", "OnlineSecurity", "TechSupport", "Contract"]])
     16 print(preprocessed_df)
     17 

17 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in _raise_if_missing(self, key, indexer, axis_name)
   5939 
   5940             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 5941             raise KeyError(f"{not_found} not in index")
   5942 
   5943     @overload

KeyError: "['MonthlyCharges', 'TotalCharges'] not in index"
  1. Veirfying KBest features, MonthlyCharges and TotalCharges are not there!
kbest = final_estimator2.named_steps["selectkbest"].get_support(indices=True)

used_df = transformed_df_columns.iloc[:, kbest]

kbest features

Is there a step I'm forgetting?

I did a double check in all the code and official documentations.

I'm expecting to understand why my preprocess is asking for two features that "in theory" wasn't used and selected by the KBest during the training fase.

Upvotes: 1

Views: 32

Answers (0)

Related Questions