Reputation: 111
I want to OneHotEncode a pd.DataFrame with missing values.When I try to OneHotEncode, it throws an error regarding missing values.
ValueError: Input contains NaN
When I try to use a SimpleImputer to fix missing values, it throws an error regarding categorical data
ValueError: Cannot use mean strategy with non-numeric data: could not convert string to float: 'RH'
I can't apply OneHotEncoding because of missing values and SimpleImputer because of categorical data. Is there a way around this besides dropping columns or rows?
Upvotes: 0
Views: 334
Reputation: 88
You can use either of the below 2 methods to eliminate NaN categorical values -
Option 1: Replace the missing values with the most frequent category. For instance, if you have a column with 51% values belonging to one category then use the below code to fill missing values of that category
df['col_name'].fillna('most_frequent_category',inplace=True)
Option 2: If you don't wish to impute missing values to the most frequent category then you can create a new category called 'Other' (or similar neutral category relevant to your variable)
df['col_name'].fillna('Other',inplace=True)
Both these methods will impute your missing categorical values and then you will be able to OneHotEncode them.
Upvotes: 2