Ridge regression within a loop

I am new in coding, so I still struggle with simple things as loops, subsetting, and data frame vs. matrix.

I am trying to fit a ridge regression for a multivariable X (X1=Marker 1, X2= Marker, X3= Marker 3,..., X1333= Marker 1333), shown in the first image, as a predictor variable of Y, in the second image.

enter image description here

enter image description here

I want to compute the sum of the squared errors (SSE) for varying tuning parameter λ (between 1 and 20). My code is the following:

#install.packages("MASS")
library(MASS)


fitridge <- function(x,y){
  fridge=lm.ridge (y ~ x, lambda = seq(0, 20, 2)) #Fitting a ridge regression for varying λ values
  sum(residuals(fridge)^2) #This results in SSE
}

all_gcv= apply(as.matrix(genmark_new),2,fitridge,y=as.matrix(coleslev_new)) 
}

However, it returns this error, and I don't know what to do anymore. I have tried converting the data set into a matrix, a data frame, changing the order of rows and columns...

Error in colMeans(X[, -Inter]) : 'x' must be an array of at least two dimensions.

I just would like to take each marker value from a single row (first picture), pass them into my fitridge function that fits a ridge regression against the Y from the second data set (in the second picture). And then subset the SSE and their corresponding lambda values

Upvotes: 0

Views: 413

Answers (1)

StupidWolf
StupidWolf

Reputation: 46958

You cannot fit a ridge with only one independent variable. It is not meant for this. In your case, most likely you have to do:

genmark_new = data.frame(matrix(sample(0:1,1333*100,replace=TRUE),ncol=1333))
colnames(genmark_new) = paste0("Marker_",1:ncol(genmark_new))
coleslev_new = data.frame(NormalizedCholesterol=rnorm(100))
Y = coleslev_new$NormalizedCholesterol

library(MASS)
fit = lm.ridge (y ~ ., data=data.frame(genmark_new,y=Y),lambda = seq(0, 20, 2)) 

And calculate residuals for each lambda:

apply(fit$coef,2,function(i)sum((Y-as.matrix(genmark_new) %*% i)^2))
       0        2        4        6        8       10       12       14 
26.41866 27.88029 27.96360 28.04675 28.12975 28.21260 28.29530 28.37785 
      16       18       20 
28.46025 28.54250 28.62459

If you need to fit each variable separately, you can consider using a linear model:

fitlm <- function(x,y){
  fridge=lm(y ~ x) 
  sum(residuals(fridge)^2)
}

all_gcv= apply(genmark_new,2,fitlm,y=Y)

Suggestion, check out make notes or introductions to ridge, they are meant for multiple variate regressions, that is, more than 1 independent variable.

Upvotes: 1

Related Questions