jessexknight
jessexknight

Reputation: 825

asterisk in formula with many variables: how to limit order of interactions?

Suppose I have a list of variables names x = c('a','b','c','d','e') for a statistical model. When building the formula, it's nice to use something like paste('y ~',paste(x,collapse=' + ')) to get y ~ a + b + c + d + e, especially when x may change.

Now I'd like to do the same thing with interaction terms, but paste(x,collapse=' : ') produces a : b : c : d : e, which is only one term, and paste(x,collapse=' * ') produces a * b * c * d * e, which includes all possible interactions across all orders -- i.e. a + b + c + ... + a:b + a:c + ... a:b:c + a:b:d + ... + a:b:c:d:e. How can I limit the order of interaction terms up to say, 2nd, e.g. a:b ?

Upvotes: 2

Views: 125

Answers (3)

Mikael Jagan
Mikael Jagan

Reputation: 11326

reformulate handles this problem quite naturally, though how you would apply it is context-dependent.

If you want to drop interactions of order greater than order_max from an existing formula, then you can do:

f1 <- function(formula, order_max) {
    a <- attributes(terms(formula))
    reformulate(termlabels = a$term.labels[a$order <= order_max], 
                response = if (r <- a$response) a$variables[[1L + r]],
                intercept = a$intercept,
                env = environment(formula))
}

f1(y ~ a * b * c * d * e, 2L)
## y ~ a + b + c + d + e + a:b + a:c + b:c + a:d + b:d + c:d + a:e + 
##     b:e + c:e + d:e

If you have a character vector x listing names of variables, and you want to construct a formula containing their interactions up to order order_max, then you can do:

Edit: Never mind - follow @RitchieSacramento's suggestion and use the ^ operator in this case.

f2 <- function(x, order_max, response = NULL, intercept = TRUE, env = parent.frame()) {
    paste1 <- function(x) paste0(x, collapse = ":")
    combn1 <- function(n) if (n > 1L) combn(x, n, paste1) else x
    termlabels <- unlist(lapply(seq_len(order_max), combn1), FALSE, FALSE)
    reformulate(termlabels = termlabels, response = response,
                intercept = intercept, env = env)
}

f2(letters[1:5], 2L, response = quote(y))
## y ~ a + b + c + d + e + a:b + a:c + a:d + a:e + b:c + b:d + b:e + 
##     c:d + c:e + d:e

To be parsed correctly, nonsyntactic variable names must be protected with backquotes:

f2(c("`!`", "`?`"), 1L, response = quote(`#`))
## `#` ~ `!` + `?`

Upvotes: 3

lroha
lroha

Reputation: 34511

The most straightforward way to achieve this, assuming you want to cross all terms to a specified degree, is to use the ^ operator in the formula.

x = c('a','b','c','d','e')

# Build formula using reformulate
(fm <- reformulate(x, "y"))
y ~ a + b + c + d + e

# Cross to second degree  
(fm2 <- update(fm, ~ .^2))
y ~ a + b + c + d + e + a:b + a:c + a:d + a:e + b:c + b:d + b:e + 
c:d + c:e + d:e

# Terms of f2 as character:
attr(terms.formula(fm2), "term.labels")
[1] "a"   "b"   "c"   "d"   "e"   "a:b" "a:c" "a:d" "a:e" "b:c" "b:d" "b:e" "c:d" "c:e" "d:e"

# Cross to third degree
(fm3 <- update(fm, ~ .^3))
y ~ a + b + c + d + e + a:b + a:c + a:d + a:e + b:c + b:d + b:e + 
    c:d + c:e + d:e + a:b:c + a:b:d + a:b:e + a:c:d + a:c:e + 
    a:d:e + b:c:d + b:c:e + b:d:e + c:d:e

Upvotes: 4

jessexknight
jessexknight

Reputation: 825

Here is a another solution to create the : terms:

iterms = function(x,n,lower=TRUE){
  return(paste(lapply(ifelse(lower,1,n):n,function(ni){
    paste(apply(combn(x,ni),2,paste,collapse=':'),collapse=' + ')
  }),collapse=' + '))
}

Testing with:

x = c('a','b','c','d')
print(iterms(x,1))
print(iterms(x,2))
print(iterms(x,3))
print(iterms(x,3,lower=FALSE))

yields:

[1] "a + b + c + d"
[1] "a + b + c + d + a:b + a:c + a:d + b:c + b:d + c:d"
[1] "a + b + c + d + a:b + a:c + a:d + b:c + b:d + c:d + a:b:c + a:b:d + a:c:d + b:c:d"
[1] "a:b:c + a:b:d + a:c:d + b:c:d"

Upvotes: 0

Related Questions