The Great
The Great

Reputation: 7713

How to fit column wise ordinal encoding

I have a dataframe like as shown below

tdf = pd.DataFrame({'grade': np.random.choice(list('AAAD'),size=(5)),
                   'dash': np.random.choice(list('PPPS'),size=(5)),
                   'dumeel': np.random.choice(list('QWRR'),size=(5)),
                   'dumma': np.random.choice((1234),size=(5)),
                   'target': np.random.choice([0,1],size=(5))
})

I would like to get a mapping dictionary based on ordinal encoding technique as given here

from feature_engine.encoding import OrdinalEncoder
X = tdf.drop(['target'], axis=1)
y = tdf.target
train_t, test_t, y_train, y_test = train_test_split(X, y, 
                                                test_size=0.25,
                                                random_state=0)
cat_list= tdf.select_dtypes(include=['object']).columns.tolist()
ordinal_encoders = {}
for col in cat_list:
    print(col)
    ordi = OrdinalEncoder(encoding_method='ordered')
    ordinal_encoders[col] = ordi
    ordi.fit(train_t[col], y_train)
    train_t[col] = ordi.transform(train_t[col])

However, I get the below error

TypeError: X is not a pandas dataframe. The dataset should be a pandas dataframe.

How can I fit and transform the ordinal encoder on a column by column basis? I am able to get the encoders initialized as shown below but unable to fit and transform them

{'grade': OrdinalEncoder(),
 'dash': OrdinalEncoder(),
 'dumeel': OrdinalEncoder()}

I would like to do it this way because later, I wish to finally get the mapping dictionary (ordinal value for each of the catgories and store it in a dictionary)

Upvotes: 0

Views: 1086

Answers (1)

Simon Hawe
Simon Hawe

Reputation: 4539

scikit learn has a so-called ColumnTransformer for that exact case. There you can specify various transformers and the columns they should be applied too. In code, that would roughly read like

from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer(transformers=[('ord', OrdinalEncoder(encoding_method='ordered'), ['grade', 'dash', 'dumeel'])], remainder="passthrough") # remainder passthrough means that all not mentioned columns will not be touched.
transformed = transformer.fit_transform(tdf)

Upvotes: 1

Related Questions