Reputation: 1001
For example, I have a formula like this:
main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)
How can I remove two consecutive pluses (+) between var3
and var4
(and leave only one)?
Upvotes: 0
Views: 67
Reputation: 43334
It's possible to edit a formula's component parts without coercing to string. Formulas contain two parts, an expression (the part you write) and an environment (where you write it, maybe with variables in it referred to in the expression). The environment we want to hold on to; the expression we want to change.
Expressions (by which here I mean language objects like symbols and calls, not the narrowly-defined expression
class) are syntax trees, which behave a bit like lists. They can be subset:
f <- main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)
f[[1]]
#> `~`
f[[2]]
#> main_var
f[[3]]
#> 0 + var1:x + var2:y + var3 + +var4 + (0 + main_var | x_y) + (0 +
#> add_var | x_y) + (1 | x_y)
f[[3]][[3]]
#> (1 | x_y)
and therefore iterated upon. Because they're tree-like structures, to iterate over the whole tree, we need to recurse. Most of the function is pretty typical for recursion (return atomic leaf nodes; recurse over nodes with children), but the tricky part is the condition to identify the part we want to change. If you look at the node in question, it contains a unary (with one argument) +
call:
f <- main_var ~ 0 + var1:x + var2:y + var3 + + var4 + (0 + main_var|x_y) + (0 + add_var|x_y) + (1|x_y)
f[[3]][[2]][[2]][[2]][[3]]
#> +var4
f[[3]][[2]][[2]][[2]][[3]][[1]]
#> `+`
f[[3]][[2]][[2]][[2]][[3]][[2]]
#> var4
All other +
calls are binary. We can thus check for length-2 nodes where the first node is +
. As it turns out, getting a +
expression is also a bit tricky; the simplest is experssion(+)[[1]]
or quote(+1)[[1]]
, but once you have that, equality checking works as usual.
Putting the pieces together, and cleaning up by coercing pieces back to expressions and formulas,
remove_unary_plus <- function(expr){
if (length(expr) == 1) {
# return atomic elements
return(expr)
} else if (length(expr) == 2 && expr[[1]] == expression(`+`)[[1]]) {
# for unary plus calls, return the argument without the plus
return(expr[[2]])
} else {
# otherwise recurse, simplifying the results back to a language object
clean_expr <- as.call(lapply(expr, remove_unary_plus))
# if it's a formula, hold on to the environment
if (inherits(expr, "formula")) {
clean_expr <- as.formula(clean_expr, env = environment(expr))
}
return(clean_expr)
}
}
f_clean <- remove_unary_plus(f)
f_clean
#> main_var ~ 0 + var1:x + var2:y + var3 + var4 + (0 + main_var |
#> x_y) + (0 + add_var | x_y) + (1 | x_y)
And look, it keeps its environment:
str(f)
#> Class 'formula' language main_var ~ 0 + var1:x + var2:y + var3 + +var4 + (0 + main_var | x_y) + (0 + add_var | x_y) + (1 | x_y)
#> ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
str(f_clean)
#> Class 'formula' language main_var ~ 0 + var1:x + var2:y + var3 + var4 + (0 + main_var | x_y) + (0 + add_var | x_y) + (1 | x_y)
#> ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
Obviously this is a bit of a pain for day-to-day formula manipulation, but, well, it's possible, maybe useful for programmatic usage, and (to me, at least) interesting.
Upvotes: 2
Reputation: 226087
Something like
as.formula( gsub( ""\\+s*\\+", "+", deparse(f)))
where f
is your formula.
Upvotes: 1