Reputation: 1135
Is it possible to use a categorical variable, as-is, in Python/Scikit-learn GLM models? I do realize the alternative of one-hot encoding. My issue with this approach is that I will be unable to test the entire variable for significance. I can only test the encoded variable (which is partial).
Why is it that SAS can handle such a variable and not Python? Please advise.
Upvotes: 1
Views: 1146
Reputation: 8811
It actually depends on the data that you have. For instance, if you can assign some sort of order to the categorical variable (Ordinal Values) like low
,medium
and high
, you can assign them numbers like 1, 2 and 3. However, it gets a little trickier if there is no order whatsoever. Besides One-hot Encoding, you can try Helmert Coding Scheme. You can also read this blog post for more analysis. There are also various other coding schemes in sklearn for categorical variables:
You can read more about other Categorical Encoders in Sklearn here.
Upvotes: 2