user2957945
user2957945

Reputation: 2413

Separate variable name from factor level in character string

I am trying to separate the levels of a factor variable from the variable name (the format returned by a model).

My string (sorry: edited to be more representative)

vars <- c("(Intercept)", "wt", "gearyy", "cyl4", "cyl8")

Expected outcome (although a list would also be fine)

#     [,1]   [,2]
#[1,] "wt"   ""  
#[2,] "gear" "yy" 
#[3,] "cyl"  "4" 
#[4,] "cyl"  "8" 

My attempt: I thought i may be able to grep to partial search (but no success)

grep(paste0("\\b", "cyl", "\\b") , est$vars )


The model:

library(glmnet)

mtcars$gear <- factor(mtcars$gear, labels=c("xx", "yy", "zz"))
mtcars$am <- factor(mtcars$am, labels=c("yes", "no"))
mtcars$cyl <- factor(mtcars$cyl)

x <- model.matrix(~ wt + disp + gear + am + cyl, data=mtcars,
                  contrasts.arg = lapply(mtcars[sapply(mtcars, is.factor)], 
                                         contrasts, contrasts=FALSE))

fit <- glmnet(x, mtcars$mpg)  
cfs <- coef(fit, s=0.5)     

est <- data.frame(vars=cfs@Dimnames[[1]][cfs@i+1], est=cfs@x, stringsAsFactors=F)

Upvotes: 2

Views: 261

Answers (1)

akrun
akrun

Reputation: 887971

Try

 pat <- paste(colnames(mtcars), collapse="|")
 v2 <- sub(pat, '', vars[-1])
 v1 <- sub(paste(v2[nzchar(v2)], collapse='|'), '', vars[-1])
 data.frame(v1, v2)
 #    v1 v2
 #1   wt   
 #2 gear yy
 #3  cyl  4
 #4  cyl  8

Update

According to comments from the OP, it may be better to have

 v1 <- sub(paste0(paste0(v2[nzchar(v2)], "+$"), collapse='|'), '', vars)

Upvotes: 3

Related Questions