Reputation: 107
How would I extract 'mpg' from the following formula in R? I understand that it would be useful to convert the formula into character first and then use some kind of regex. But I don't know which one.
mpg ~ x1 + x2
Upvotes: 1
Views: 87
Reputation: 32548
Here's an approach that uses regex
x = mpg ~ x1 + x2
gsub(" ","",gsub("~.*", "", deparse(x)))
#[1] "mpg"
Upvotes: 3
Reputation: 17369
All of the given answers will work for your specific use case. But if you wish to use this in a more generalized sense, there are some caveats to be aware of. To discuss these, we'll define a few formulae
fm <- mpg ~ x1 + x2
fm_one <- ~ x1 + x2
fm_multi <- mpg + y1 ~ x1 + x2
all.vars
will return a character vector of all of the variables in the formula. It is the fastest of the options given to this point. However, it does not distinguish between variables on the left hand and right hand side of the equation. Whether or not this is acceptable depends on your use case.
all.vars(fm)[1] # "mpg"
all.vars(fm_one)[1] # "x1" (this is a right hand side variable)
all.vars(fm_multi)[1] # "mpg" (missing other left hand side variables)
The terms
approach (as.character(attr(terms(fm), "variables"))
) will generate a similar vector, but the variable names start in the second position (the list
call takes up the first element). It suffers the same disadvantages as the all.vars
approach.
as.character(attr(terms(fm), "variables"))[2] # "mpg"
as.character(attr(terms(fm_one), "variables"))[2] # "x1"
as.character(attr(terms(fm_multi), "variables"))[2] # "mpg"
Using as.character
produces a character vector of either length 3 or 2, depending on if there is or isn't a left hand side. This at least has the ability to return the entire left side, but it won't return a character vector of the response variables. It still has the disadvantage, however, of not being distinguishing left sides variables from right side variables.
as.character(fm) # "~" "mpg" "x1" "x2"
as.character(fm_one) # "~" "x1" "x2"
as.character(fm_multi) # "~" "mpg" "y1" "x1" "x2"
The deparse
method is somewhat slower than all.vars
(but still measured in nanoseconds), and has the primary advantage of distinguishing left hand side from right hand side.
gsub(" ","",gsub("~.*", "", deparse(fm))) # "mpg"
gsub(" ","",gsub("~.*", "", deparse(fm_one))) # ""
gsub(" ","",gsub("~.*", "", deparse(fm_multi))) # "mpg+y1"
Depending on your actual needs, you may not need to protect against one-sided or multivariate formulae. If you are working in a system where it is known that all of your formulae will be univariate and two sided, all.vars
is probably your best bet. If you can't be sure of that, I'd recommend using the deparse
method. That will at least ensure that you always get response variables when you are looking for response variables.
Upvotes: 3
Reputation: 887028
We can use all.vars
all.vars(form)[1]
#[1] "mpg"
Or with terms
as.character(attr(terms(form), "variables")[[2]])
#[1] "mpg"
Or another option is
paste(form)[[2]]
#[1] "mpg"
where
form <- mpg ~ x1 + x2
Upvotes: 5
Reputation: 24252
Given the formula:
frm <- as.formula(mpg ~ x1 + x2)
it is possible to extract the term on the left side simply using:
as.character(frm[[2]])
[1] "mpg"
Upvotes: 2