Reputation: 1
I have the House Prices - Advanced Regression Techniques Data set. I need to do Lasso and Ridge Regularization on it. I saved the train data in the variable named house. Typed the following code:
house.info()
Got this output: enter image description here
There are columns in this data set which are numerical(int64 and float 64) but they actually are categorical(both ordinal and nominal).
I wanted to ask whether I can standardize these categorical variables or should I first convert all these variables into type "object" using house[col_name]=house[col_name].astype(str)
and then do one- hot encoding on these variables and standardize the rest of the numerical columns?
Upvotes: 0
Views: 485
Reputation: 1155
When a column is cardinal it is possible to apply one-hot-encoding, in this way the categorical columns can be vectorized in a binary way for each category.
import pandas as pd
raw_df= pd.get_dummies(data=raw_df,
cardinal_features=['col1', 'col2', 'col3'],
prefix=['feature1_', 'feature2_', 'feature3_'])
Upvotes: 0