Tom
Tom

Reputation: 2341

Get all the factor variables from a formula call

I have a formula that looks as follows:

formula <- as.formula(y ~ x + as.factor(z) + A + as.factor(B) + C:as.factor(A) + as.factor(D) + E + F + as.factor(G))

I would like to extract all the variable names that have factors to turn them to factors. If I use all.vars(formula), I get all variables and not just the as.factor().

Desired result:

factornames <- c("z", "B", "A", "D", "G")

I eventually want to feed the selected variables to:

# Turn factors into factors
DF[factornames] <- lapply(DF[factornames], factor)
## turn factor variables into dummies
DF <- as.data.frame(model.matrix(phantom ~ ., transform(DF, phantom=0)))

Upvotes: 1

Views: 219

Answers (2)

jay.sf
jay.sf

Reputation: 72593

We can deparse the formula, then grepexp everything in parentheses preceeded with "factor" using this historic solution.

r <- Reduce(paste0, deparse(formula))
el(regmatches(r, gregexpr("(?<=factor\\().*?(?=\\))", r, perl=T)))
# [1] "z" "B" "A" "D" "G"

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

You can do some string manipulation to get the column names which are factors.

factornames <- stringr::str_match_all(as.character(formula)[3], 'as.factor\\(([A-Za-z])\\)')[[1]][,-1]
factornames
#[1] "z" "B" "A" "D" "G"

([A-Za-z]) part of regex should be changed based on the column names in your data.

Upvotes: 1

Related Questions