using get_dummies() and OneHotEncoding on large number of Categorical Variable

Question

In most of the Academic examples, we used to convert categorical features using get_dummies() or OneHotEncoding(). Lets say I want to use Country as a feature and in the dataset we have 100 unique countries. When we apply get_dummies() or OneHotEncoding() on country we will get 100 columns and model will be trained with 100 country columns + other features.

Lets say, we have deployed this model into production, and we received only 10 countries. When we pre-process the data by using get_dummies() or OneHotEncoding(), then model will fail predict because "Number of features model trained is not matching with the features passed" as we are passing 10 country columns + other features.

Can you please help me to understand how to handle such scenarios.How to deal with Large number of Categorical variables in multiple columns can be pre-process in the Model building.

using get_dummies() and OneHotEncoding on large number of Categorical Variable

Answers (1)

Related Questions