tunar
tunar

Reputation: 189

Can the input for "Lasso" in python contain categorical variables?

I want to perform multiple linear regression in python with lasso. I am not sure whether the input observation matrix X can contain categorical variables. I read the instructions from here: lasso in python

But it is simple and not indicate the types allowed for. For example, my code includes:

model = Lasso(fit_intercept=False, alpha=0.01)
model.fit(X, y)

In the code above, X is an observation matrix with size of n-by-p, can one of the p variables be categorical type?

Upvotes: 4

Views: 3619

Answers (2)

jawsem
jawsem

Reputation: 771

A previous poster has a good answer for this, you need to encode your categorical variables. The standard way is one hot encoding (or dummy encoding), but there are a many methods for doing this.

Here is a good library that has many different ways you can encode your categorical variables. These are also implemented to work with Sci-kit learn.

https://contrib.scikit-learn.org/categorical-encoding/

Upvotes: 1

Daniel Tamming
Daniel Tamming

Reputation: 11

You need to represent the categorical variables using 1s and 0s. If your categorical variables are binary, meaning each belongs to one of two categories, then you replace all category A and B variables into 0s and 1s, respectively. If some have more than two categories, you will need to use dummy variables.

I usually have my data in a Pandas dataframe, in which case I use houses = pd.get_dummies(houses), which creates the dummy variables.

Upvotes: 1

Related Questions