Reputation: 12515
Suppose I have some DataFrame:
import numpy as np
import pandas as pd
df = pd.DataFrame(
{
'a': list('abcde'),
'b': list('aaabb')
}
)
And I want to use a sklearn.compose.ColumnTransformer
to transform it:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
transformer = ColumnTransformer(
[
('a', OneHotEncoder(), ['a']),
('b', OneHotEncoder(), ['b']),
]
)
transformer.fit(df)
I can get the feature names from this transformer like so:
transformer.get_feature_names()
# ['a__x0_a', 'a__x0_b', 'a__x0_c', 'a__x0_d', 'a__x0_e', 'b__x0_a', 'b__x0_b']
But how can I get a mapping from the original "parent" feature to each "child" feature?
Upvotes: 1
Views: 3318
Reputation: 12515
Try this:
>>> from sklearn.base import *
>>> from sklearn.preprocessing import SimpleImputer
>>> import re
>>> transformers = [
... (feature, t_inst)
... for feature, t_inst, _ in transformer.transformers_
... if isinstance(t_inst, BaseEstimator)
... ]
>>> full_mapping = {}
>>> for feature, t_inst in transformers:
... feature_names = t_inst.get_feature_names()
... if isinstance(t_inst, OneHotEncoder):
... feature_names = list(map(lambda x: re.sub('^x0', feature, x), feature_names))
... elif isinstance(t_inst, (SimpleImputer,)):
... pass
... else:
... raise ValueError(f'Transformer type {t_inst.__class__.__name__} not supported')
... full_mapping[feature] = feature_names
...
>>> full_mapping
{'a': ['a_a', 'a_b', 'a_c', 'a_d', 'a_e'], 'b': ['b_a', 'b_b']}
Note the use of re.sub
to clean up some of the feature-name patterns native to sklearn.compose.ColumnTransformer
.
Upvotes: 1