Reputation: 5753
import category_encoders as ce
ord_Ce = ce.ordinal.OrdinalEncoder()
ord_Ce.fit_transform(pd.DataFrame([2, np.nan, 3]).astype(object))
produces 2->1, np.nan->0, 3->3
encoding. Why is it missing out on 2 while encoding. It would seem to me that 3 should be encoded as 2 by the encoder. Any light on why this behavior?
Upvotes: 0
Views: 53
Reputation: 5437
Skimming through the source code reveals that in the ordinal_encoding
-function, the numbers of categories are enumerated (3 categories in total, starting from 1) and these figures are also used for the encoding
categories_dict = {x: i + 1 for i, x in enumerate(categories)}
X[str(col) + '_tmp'] = X[col].map(lambda x: categories_dict.get(x))
afterwards missing values are imputed to 0.
Summarizing: np.nan has the category 2 which is transformed to 0 in a post processing step
Upvotes: 2