Reputation: 2413
I am trying to separate the levels of a factor variable from the variable name (the format returned by a model).
My string (sorry: edited to be more representative)
vars <- c("(Intercept)", "wt", "gearyy", "cyl4", "cyl8")
Expected outcome (although a list would also be fine)
# [,1] [,2]
#[1,] "wt" ""
#[2,] "gear" "yy"
#[3,] "cyl" "4"
#[4,] "cyl" "8"
My attempt: I thought i may be able to grep
to partial search (but no success)
grep(paste0("\\b", "cyl", "\\b") , est$vars )
The model:
library(glmnet)
mtcars$gear <- factor(mtcars$gear, labels=c("xx", "yy", "zz"))
mtcars$am <- factor(mtcars$am, labels=c("yes", "no"))
mtcars$cyl <- factor(mtcars$cyl)
x <- model.matrix(~ wt + disp + gear + am + cyl, data=mtcars,
contrasts.arg = lapply(mtcars[sapply(mtcars, is.factor)],
contrasts, contrasts=FALSE))
fit <- glmnet(x, mtcars$mpg)
cfs <- coef(fit, s=0.5)
est <- data.frame(vars=cfs@Dimnames[[1]][cfs@i+1], est=cfs@x, stringsAsFactors=F)
Upvotes: 2
Views: 261
Reputation: 887971
Try
pat <- paste(colnames(mtcars), collapse="|")
v2 <- sub(pat, '', vars[-1])
v1 <- sub(paste(v2[nzchar(v2)], collapse='|'), '', vars[-1])
data.frame(v1, v2)
# v1 v2
#1 wt
#2 gear yy
#3 cyl 4
#4 cyl 8
According to comments from the OP, it may be better to have
v1 <- sub(paste0(paste0(v2[nzchar(v2)], "+$"), collapse='|'), '', vars)
Upvotes: 3