Reputation: 723
I used two features to train a classification model say feature A and B. Feature A is more important than feature B. Feature A has ordinal data and hence I have label encoded it and its value range from 1 to 5. Feature B is also a categorical feature and have one hot encoded it after label encoding
Due to the above encoding, feature A has a value ranging from 1 to 5 whereas feature B has multiple columns and each column value is either 0 or 1.
Now after my model training, my model is too much skewed towards feature A as its value range from 1 to 5 whereas it gives very less attention to feature B.
Now if I feature scale using standard scalar, Feature A will be having the value between -1 to 1 and hence after model training, Feature B have more role than feature A to make the decision.
Is there a better way to feature scale both the features so that Feature A has more edge but not very much that feature B is completely ignored
Upvotes: 1
Views: 76
Reputation: 731
Once you one hot encode, you will have a set of features only. The model won't know if the features belong to A or B. You can then calculate Feature importance or maybe run Feature Selection Algorithms in order to make it more efficient.
However, if you feel Feature A is more important, then try scaling to other limits other than -1 to 1 inorder to maintain more columns for Feature A than Feature B. Or scale both correspondingly. But again, the model sees it only as a set of features and so try changing the model/parameters rather than focusing on this for improving performance.
Upvotes: 1