Reputation: 1698
After using Pandas get_dummies
on 3 categorical columns to get a one hot-encoded Dataframe, I've trained (with some success) a Perceptron model.
Now I would like to predict the result from a new observation, that it is not hot-encoded.
Is there any way to record the get_dummies
column mapping to re-use it?
Upvotes: 5
Views: 3461
Reputation: 3223
There is no automatic procedure to do it at the moment, to my knowledge. In the future release of sklearn
CategoricalEncoder
will be very handy for this job. You can already get your hands on it, if you clone sklearn
github master branch and build in yourself. At the moment 2 options come to my mind:
LabelEncoder+OneHotEncoder
combination, see this answer, for example;pd.get_dummies
on the test set/example. Loop through the output test OHE columns, drop those that do not appear in the training OHE and add those that are missing in test OHE filled with zeros.Upvotes: 5