user14622762
user14622762

Reputation:

Lasso Regression coefficients to find a linear model

I am doing linear models in R. My factors include birth rates, death rates, infant mortality rates, life expectancies, and region. region has 7 levels, using numerical numbers to represent each region:

  1. East Asia & Pacific
  2. South Asia
  3. Europe & Central Asia
  4. North America
  5. Latin America
  6. Middle East & North Africa
  7. Sub-Saharan Africa

I ran a Lasso Regression in R to try to improve the generalized linear model. The Lasso Regression coefficients is as follows:
enter image description here

I will put the factors selected by Lasso Regression into the lm function in R:

Lasso.lm <- lm(log(GNIpercapita) ~ deathrate + infantdeaths + life.exp.avg + 
                                    life.exp.diff + region, data=econdev) 

However, for regions, how do I add each region into the linear model lm? For example, regionEast Asia & Pacific, I can't jut add as + regionEast Asia & Pacific.

Upvotes: 2

Views: 641

Answers (2)

benjasast
benjasast

Reputation: 107

I agree with previous comments in that it is not recommended to pick and choose parts of a categorical variable. If you would still like to do it, it is easy using the modeldb package to create dummy variables for each level of your categorical variable. Remember in your regression lm() you have to leave one level of the categorical variable out to avoid perfect collinearity.

library(modeldb)

df %>% 
  add_dummy_variables(region)

Upvotes: 0

sconfluentus
sconfluentus

Reputation: 4993

You cannot use pieces and parts of the category.

You can eliminate numerical variables, or entire columns of categorical variables, but you cannot pick and choose individual categories because it fragments your dataframe.

You might be better off to use the outcome of the Lasso Regression itself and predict from it. It is not less of a regression because of the regularization. It is more complex, and more robust and less straight forward, but not 'worse'.

If that does not work for you, then you can run an lm() with the continuous variables selected and the entire region variable and accept that the model is imperfect as all models are or remove the region and settle for what may be a less predictive model.

Upvotes: 0

Related Questions