Hugo
Hugo

Reputation: 1698

How to use Pandas get_dummies on predict data?

After using Pandas get_dummies on 3 categorical columns to get a one hot-encoded Dataframe, I've trained (with some success) a Perceptron model.

Now I would like to predict the result from a new observation, that it is not hot-encoded.

Is there any way to record the get_dummies column mapping to re-use it?

Upvotes: 5

Views: 3461

Answers (1)

Mischa Lisovyi
Mischa Lisovyi

Reputation: 3223

There is no automatic procedure to do it at the moment, to my knowledge. In the future release of sklearn CategoricalEncoder will be very handy for this job. You can already get your hands on it, if you clone sklearn github master branch and build in yourself. At the moment 2 options come to my mind:

  • use LabelEncoder+OneHotEncoder combination, see this answer, for example;
  • simply retrieve (and store, if needed) the list of columns after the training OHE output. Then run pd.get_dummies on the test set/example. Loop through the output test OHE columns, drop those that do not appear in the training OHE and add those that are missing in test OHE filled with zeros.

Upvotes: 5

Related Questions