Reputation: 13
In this paper, the authors perform radiomics feature selection for survival prediction by:
I would like to replicate this approach (albiet for logistic regression rather than cox-regression).
I am able to use the following R code to obtain the top K features from the Lasso models using the 'boot' library:
lasso_Select <- function(x, indices){
x <- x[indices,]
y <- x$Outcome
x = subset(x, select = -Outcome)
x2 <- as.matrix(x)
fit <- glmnet(x2, y , family="binomial",alpha=1, standardize=TRUE)
cv <- cv.glmnet(x2, y, family="binomial",alpha=1, standardize=TRUE)
fit <- glmnet(x2, y, family="binomial",alpha=1, lambda=cv$lambda.min, standardize=TRUE)
return(coef(fit)[,1])
}
myBootstrap <- boot(scaled_train, lasso_Select, R = 1000, parallel = "multicore", ncpus=5)
However, I don't believe I can access the individual resampled datasets to then run the multiple logistic regression models and choose the most common.
Any advice on how to approach this?
Upvotes: 1
Views: 919
Reputation: 11
There are a few R
Packages that also enable relatively transparent bootstrap LASSO analysis!
Check out:
You dont have to explicitly use the boot
package - you may simply loop through the lasso procedure and save off the coefficients.
Saving off the individually "resampled" data sets is going to get memory-expensive really fast - I would suggest to simply save off the sampling scheme per iteration: i.e. the columns and rows that were sampled.
Your particular ask was something that I had not thought of with fastFeatures
, and is a great idea to keep as an option for further analysis.
Upvotes: 1
Reputation: 575
As the manual page for boot()
explains:
For most of the boot methods the resampling is done in the master process, but not if
simple = TRUE
norsim = "parametric"
.
As you are not doing parametric bootstrapping and you don't need to specify simple = TRUE
, the code displayed when you type boot::boot
at the R prompt shows how the resampled data indices are generated. The critical code is:
if (!simple)
i <- index.array(n, R, sim, strata, m, L, weights)
where n
is the number of data rows, R
is the number of bootstrap samples, and the other arguments are defined in the call to boot()
and don't seem to apply to your situation. Typing boot:::index.array
shows the code for that function, which in turn calls boot:::ordinary.array
for your situation. In your situation, i
is just a matrix showing which data rows to use for each bootstrap sample.
It should be reasonably straightforward to tweak the code for boot()
to return that matrix of indices along with the other values the function normally returns.
An alternative might be to return indices
directly in your lasso_Select()
function, although I'm not sure how well the boot()
function would handle that.
Upvotes: 1