Reputation: 2341
I have a formula that looks as follows:
formula <- as.formula(y ~ x + as.factor(z) + A + as.factor(B) + C:as.factor(A) + as.factor(D) + E + F + as.factor(G))
I would like to extract all the variable names that have factors to turn them to factors. If I use all.vars(formula)
, I get all variables and not just the as.factor()
.
Desired result:
factornames <- c("z", "B", "A", "D", "G")
I eventually want to feed the selected variables to:
# Turn factors into factors
DF[factornames] <- lapply(DF[factornames], factor)
## turn factor variables into dummies
DF <- as.data.frame(model.matrix(phantom ~ ., transform(DF, phantom=0)))
Upvotes: 1
Views: 219
Reputation: 72593
We can deparse
the formula, then grepexp
everything in parentheses preceeded with "factor" using this historic solution.
r <- Reduce(paste0, deparse(formula))
el(regmatches(r, gregexpr("(?<=factor\\().*?(?=\\))", r, perl=T)))
# [1] "z" "B" "A" "D" "G"
Upvotes: 1
Reputation: 388807
You can do some string manipulation to get the column names which are factors.
factornames <- stringr::str_match_all(as.character(formula)[3], 'as.factor\\(([A-Za-z])\\)')[[1]][,-1]
factornames
#[1] "z" "B" "A" "D" "G"
([A-Za-z])
part of regex should be changed based on the column names in your data.
Upvotes: 1