Reputation: 5753

Categorical encoder missing out levels

import category_encoders as ce
ord_Ce = ce.ordinal.OrdinalEncoder()
ord_Ce.fit_transform(pd.DataFrame([2, np.nan, 3]).astype(object))

produces 2->1, np.nan->0, 3->3 encoding. Why is it missing out on 2 while encoding. It would seem to me that 3 should be encoded as 2 by the encoder. Any light on why this behavior?

Upvotes: 0

Answers (1)

Quickbeam2k1

Reputation: 5437

Skimming through the source code reveals that in the ordinal_encoding-function, the numbers of categories are enumerated (3 categories in total, starting from 1) and these figures are also used for the encoding

categories_dict = {x: i + 1 for i, x in enumerate(categories)}
X[str(col) + '_tmp'] = X[col].map(lambda x: categories_dict.get(x))

afterwards missing values are imputed to 0.

Summarizing: np.nan has the category 2 which is transformed to 0 in a post processing step

Upvotes: 2

Categorical encoder missing out levels

Answers (1)

Related Questions