NTG
NTG

Reputation: 25

Logistic Regression with glmnet - structure of input data

I am trying to apply Ridge and Lasso regression to a logistic regression model and am struggling to understand the required structure for the x and y inputs. I am fairly new to R, so apologies, and I hope this is clear. I believe we are using the values in the columns in x, to predict the outcomes in y

For x I have seven columns, each are categorical data (as factors). The whole of x is a dataframe with 9000 observations of 7 variables, each variable is a factor with varying levels in each. This appears in the Environment under Data

For y it is a set of outcomes - "0" or "1" - which appears in the Enviromnment as Values which says y is a Factor w/ 2 levels "0" "1", also with 9000 values

Struggling to work out what 'structure x and y need to be to get the following to work for a logistic model

alpha0.fit <- cv.glmnet(x, y , type.measure="deviance", alpha=0, family="binomial")

Any thoughts or advice gratefully received.

Upvotes: 1

Views: 668

Answers (1)

StupidWolf
StupidWolf

Reputation: 46978

You can use dummy encoding as proposed in the comments, or you can use glmnetUtils to take care of this:

library(glmnetUtils)
x = data.frame(x1 = sample(c("A","B","C"),9000,replace=TRUE),
               x2 = sample(c("D","E"),9000,replace=TRUE),
               x2 = sample(c("F","G","H"),9000,replace=TRUE)
               )

y = factor(sample(0:1,9000,replace=TRUE))

fit = cv.glmnet(y ~ .,data=data.frame(x,y),family="binomial",alpha=0)

Upvotes: 1

Related Questions