How to fit column wise ordinal encoding

Question

I have a dataframe like as shown below

tdf = pd.DataFrame({'grade': np.random.choice(list('AAAD'),size=(5)),
                   'dash': np.random.choice(list('PPPS'),size=(5)),
                   'dumeel': np.random.choice(list('QWRR'),size=(5)),
                   'dumma': np.random.choice((1234),size=(5)),
                   'target': np.random.choice([0,1],size=(5))
})

I would like to get a mapping dictionary based on ordinal encoding technique as given here

from feature_engine.encoding import OrdinalEncoder
X = tdf.drop(['target'], axis=1)
y = tdf.target
train_t, test_t, y_train, y_test = train_test_split(X, y, 
                                                test_size=0.25,
                                                random_state=0)
cat_list= tdf.select_dtypes(include=['object']).columns.tolist()
ordinal_encoders = {}
for col in cat_list:
    print(col)
    ordi = OrdinalEncoder(encoding_method='ordered')
    ordinal_encoders[col] = ordi
    ordi.fit(train_t[col], y_train)
    train_t[col] = ordi.transform(train_t[col])

However, I get the below error

TypeError: X is not a pandas dataframe. The dataset should be a pandas dataframe.

How can I fit and transform the ordinal encoder on a column by column basis? I am able to get the encoders initialized as shown below but unable to fit and transform them

{'grade': OrdinalEncoder(),
 'dash': OrdinalEncoder(),
 'dumeel': OrdinalEncoder()}

I would like to do it this way because later, I wish to finally get the mapping dictionary (ordinal value for each of the catgories and store it in a dictionary)

Simon Hawe · Accepted Answer

scikit learn has a so-called ColumnTransformer for that exact case. There you can specify various transformers and the columns they should be applied too. In code, that would roughly read like

from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer(transformers=[('ord', OrdinalEncoder(encoding_method='ordered'), ['grade', 'dash', 'dumeel'])], remainder="passthrough") # remainder passthrough means that all not mentioned columns will not be touched.
transformed = transformer.fit_transform(tdf)

How to fit column wise ordinal encoding

Answers (1)

Related Questions