Reputation: 7713
I have a dataframe like as shown below
tdf = pd.DataFrame({'grade': np.random.choice(list('AAAD'),size=(5)),
'dash': np.random.choice(list('PPPS'),size=(5)),
'dumeel': np.random.choice(list('QWRR'),size=(5)),
'dumma': np.random.choice((1234),size=(5)),
'target': np.random.choice([0,1],size=(5))
})
I would like to get a mapping dictionary based on ordinal encoding technique as given here
from feature_engine.encoding import OrdinalEncoder
X = tdf.drop(['target'], axis=1)
y = tdf.target
train_t, test_t, y_train, y_test = train_test_split(X, y,
test_size=0.25,
random_state=0)
cat_list= tdf.select_dtypes(include=['object']).columns.tolist()
ordinal_encoders = {}
for col in cat_list:
print(col)
ordi = OrdinalEncoder(encoding_method='ordered')
ordinal_encoders[col] = ordi
ordi.fit(train_t[col], y_train)
train_t[col] = ordi.transform(train_t[col])
However, I get the below error
TypeError: X is not a pandas dataframe. The dataset should be a pandas dataframe.
How can I fit and transform the ordinal encoder on a column by column basis? I am able to get the encoders initialized as shown below but unable to fit and transform them
{'grade': OrdinalEncoder(),
'dash': OrdinalEncoder(),
'dumeel': OrdinalEncoder()}
I would like to do it this way because later, I wish to finally get the mapping dictionary (ordinal value for each of the catgories and store it in a dictionary)
Upvotes: 0
Views: 1086
Reputation: 4539
scikit
learn has a so-called ColumnTransformer
for that exact case. There you can specify various transformers and the columns they should be applied too.
In code, that would roughly read like
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer(transformers=[('ord', OrdinalEncoder(encoding_method='ordered'), ['grade', 'dash', 'dumeel'])], remainder="passthrough") # remainder passthrough means that all not mentioned columns will not be touched.
transformed = transformer.fit_transform(tdf)
Upvotes: 1