eduardokapp
eduardokapp

Reputation: 1751

How to retrieve the mapping generated from a category_encoder in python?

I'm using the category encoder package in Python to use the Weight of Evidence encoder.

After I define an encoder object and fit it to data, the columns I wanted to encode are correctly replaced by their Weight of Evidence (WoE) values, according to which category they belong to.

So my question is, how can I obtain the mapping defined by the encoder? For example, let's say I have a variable with categories "A", "B" and "C". The respective WoE values could be 0.2, -0.4 and 0.02. But how can I know that 0.2 corresponds to the category "A"?

I tried acessing the "mapping" attribute, by using:

encoder = category_encoders.WOEEncoder().fit(X=data[cols], y=data[label_col])
print(encoder.mapping)

It gives me the mapping, but I'm not sure in what order the WoE values are presented. It looks like it's in decreasing order, but that still doesn't answer the category name for each level.

Upvotes: 1

Views: 1456

Answers (2)

vpap
vpap

Reputation: 1547

Adding to the answer above, assume a categorical variable BusinessTravel with 3 options Travel_Rarely, Travel_Frequently and Non_Travel.

This is how to get the mapping between the original string category options and WOE scores, where enc refers to the fitted WOE encoder.

feat = 'BusinessTravel'
left = enc.mapping[feat]
right = [x['mapping'] for x in enc.ordinal_encoder.mapping if x['col'] == feat][0]
joined = {iidx : vl for (idx, vl) in left.items() for (iidx, vvl) in right.items() if idx == vvl}

Example result. The nan label is added by the encoder.

{'Travel_Rarely': -0.09036324051329503,
  'Travel_Frequently': 0.5485236872151148,
  'Non-Travel': -0.727161878538588,
  nan: 0.0
}

The two mappings involved, left and right have their index and values properties transposed. Thus, it is easier to traverse them through their items() function.

Upvotes: 1

Ben Reiniger
Ben Reiniger

Reputation: 12602

From the source, you can see that an OrdinalEncoder (the category_encoder version, not sklearn) is used to convert from categories to integers before doing the WoE-encoding. That object is available through the attribute ordinal_encoder. And those themselves have an attribute mapping (or category_mapping) that is a dictionary with the appropriate mapping.

The format of those mapping attributes isn't particularly pleasant, but here's a stab at "composing" the two for a given feature:

from category_encoders import WOEEncoder
from sklearn.datasets import fetch_openml

titanic = fetch_openml('titanic', version=1, as_frame=True)
df = titanic['frame']

woe = WOEEncoder().fit(df[['sex', 'embarked']], df['survived'])

column = "embarked"
woe_map = woe.mapping[column]
ord_map = [
    d for d in woe.ordinal_encoder.mapping if d['col'] == column
][0]['mapping']

ord_map.map(woe_map)
# outputs:
# S     -0.215117
# C      0.701157
# NaN    1.578280
# Q     -0.095696
# dtype: float64

Upvotes: 2

Related Questions