boot-scootin
boot-scootin

Reputation: 12515

Use ColumnTransformer.get_feature_names to create a reverse feature mapping

Suppose I have some DataFrame:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    {
            'a': list('abcde'),
            'b': list('aaabb')
    }
)

And I want to use a sklearn.compose.ColumnTransformer to transform it:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

transformer = ColumnTransformer(
    [
        ('a', OneHotEncoder(), ['a']),
        ('b', OneHotEncoder(), ['b']),
    ]
)

transformer.fit(df)

I can get the feature names from this transformer like so:

transformer.get_feature_names()
# ['a__x0_a', 'a__x0_b', 'a__x0_c', 'a__x0_d', 'a__x0_e', 'b__x0_a', 'b__x0_b']

But how can I get a mapping from the original "parent" feature to each "child" feature?

Upvotes: 1

Views: 3318

Answers (1)

boot-scootin
boot-scootin

Reputation: 12515

Try this:

>>> from sklearn.base import *
>>> from sklearn.preprocessing import SimpleImputer
>>> import re
>>> transformers = [
...     (feature, t_inst)
...     for feature, t_inst, _ in transformer.transformers_
...     if isinstance(t_inst, BaseEstimator)
... ]
>>> full_mapping = {}
>>> for feature, t_inst in transformers:
...     feature_names = t_inst.get_feature_names()
...     if isinstance(t_inst, OneHotEncoder):
...             feature_names = list(map(lambda x: re.sub('^x0', feature, x), feature_names))
...     elif isinstance(t_inst, (SimpleImputer,)):
...             pass
...     else:
...             raise ValueError(f'Transformer type {t_inst.__class__.__name__} not supported')
...     full_mapping[feature] = feature_names
... 
>>> full_mapping
{'a': ['a_a', 'a_b', 'a_c', 'a_d', 'a_e'], 'b': ['b_a', 'b_b']}

Note the use of re.sub to clean up some of the feature-name patterns native to sklearn.compose.ColumnTransformer.

Upvotes: 1

Related Questions