Reputation: 189
I want to perform multiple linear regression in python with lasso. I am not sure whether the input observation matrix X can contain categorical variables. I read the instructions from here: lasso in python
But it is simple and not indicate the types allowed for. For example, my code includes:
model = Lasso(fit_intercept=False, alpha=0.01)
model.fit(X, y)
In the code above, X
is an observation matrix with size of n-by-p, can one of the p variables be categorical type?
Upvotes: 4
Views: 3619
Reputation: 771
A previous poster has a good answer for this, you need to encode your categorical variables. The standard way is one hot encoding (or dummy encoding), but there are a many methods for doing this.
Here is a good library that has many different ways you can encode your categorical variables. These are also implemented to work with Sci-kit learn.
https://contrib.scikit-learn.org/categorical-encoding/
Upvotes: 1
Reputation: 11
You need to represent the categorical variables using 1s and 0s. If your categorical variables are binary, meaning each belongs to one of two categories, then you replace all category A and B variables into 0s and 1s, respectively. If some have more than two categories, you will need to use dummy variables.
I usually have my data in a Pandas dataframe, in which case I use houses = pd.get_dummies(houses)
, which creates the dummy variables.
Upvotes: 1