ashir
ashir

Reputation: 111

Applying OneHotEncoding on categorical data with missing values

I want to OneHotEncode a pd.DataFrame with missing values.When I try to OneHotEncode, it throws an error regarding missing values.

ValueError: Input contains NaN

When I try to use a SimpleImputer to fix missing values, it throws an error regarding categorical data

ValueError: Cannot use mean strategy with non-numeric data: could not convert string to float: 'RH'

I can't apply OneHotEncoding because of missing values and SimpleImputer because of categorical data. Is there a way around this besides dropping columns or rows?

Upvotes: 0

Views: 334

Answers (1)

Ishita Shah
Ishita Shah

Reputation: 88

You can use either of the below 2 methods to eliminate NaN categorical values -

Option 1: Replace the missing values with the most frequent category. For instance, if you have a column with 51% values belonging to one category then use the below code to fill missing values of that category

df['col_name'].fillna('most_frequent_category',inplace=True)

Option 2: If you don't wish to impute missing values to the most frequent category then you can create a new category called 'Other' (or similar neutral category relevant to your variable)

df['col_name'].fillna('Other',inplace=True)

Both these methods will impute your missing categorical values and then you will be able to OneHotEncode them.

Upvotes: 2

Related Questions