marc1s
marc1s

Reputation: 779

Converting a vector into formula

Given a data.frame and a vector only with -1,0,1 with length equal to the number of columns of the data.frame. Is there a natural way to transform the vector into a formula with those elements in position with a -1 appear on the left side of the formula and those with +1 appear on the right side?

For example, given the following data.frame

df = data.frame(
  'a' = rnorm(10),
  'b' = rnorm(10),
  'c' = rnorm(10),
  'd' = rnorm(10),
  'e' = rnorm(10))

and following vector vec = c(-1,-1,0,1,1).

Is there a natural way to build formula a+b~d+e?

Upvotes: 0

Views: 810

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270298

We assume that if there are no 1's in vec that we should use a right hand side of 1 and if there are no -1's in vec then the left hand side is empty.

The alternatives each produce a character string but if a formula class object is wanted use formula(s) where s is that string.

1) paste each side Subset out the names corresponding to vec -1 giving LHS and paste/collapse them and do the same with vec 1 giving RHS and paste those with ~ all together. If we knew that there were at least one 1 in vec we could omit the if statement. Of the solutions here this seems the most straightforward.

nms <- names(df)
LHS <- paste(nms[vec == -1], collapse = "+")
RHS <- paste(nms[vec == 1], collapse = "+")
if (RHS == "") RHS <- "1"
paste0(LHS, "~", RHS)
## [1] "a+b~d+e"

2) sapply Alternately combine the LHS and RHS lines into a single sapply. If we knew that there were at least one 1 in vec then we could simplify the code by omitting the if statement. This approach is shorter than (1).

sa <- sapply(c(-1, 1), function(x) paste(names(df)[vec == x], collapse = "+"))
if (sa[2] == "") sa[2] <- "1"
paste0(sa[1], "~", sa[2])
## [1] "a+b~d+e"

3) tapply We can alternately combine the LHS and RHS lines into a single tapply like this:

ta <- tapply(names(df), vec, paste, collapse = "+")
paste0(if (any(vec == -1)) ta[["-1"]], "~", if (any(vec == 1)) ta[["1"]] else 1)
## [1] "a+b~d+e"

If we knew that -1 and 1 each appear at least once in vec then we can simplify the last line to:

paste0(ta[["-1"]], "~", ta[["1"]]])
## [1] "a+b~d+e"

Overall this approach is the shortest if we can guarantee that there will be at least one 1 and at least one -1 but otherwise handling the edge cases seems somewhat cumbersome compared to the other approaches.

Upvotes: 1

akrun
akrun

Reputation: 887951

We could do this by creating a group by paste

paste(aggregate(nm ~ vec, subset(data.frame(nm = names(df), vec, 
    stringsAsFactors = FALSE), vec != 0),
    FUN = paste, collapse= ' + ')[['nm']], collapse=' ~ ')
#[1] "a + b ~ d + e"

Or another option is tapply

paste(tapply(names(df), vec, FUN = paste, 
        collapse= ' + ')[c('-1', '1')], collapse= ' ~ ')
#[1] "a + b ~ d + e"

Upvotes: 0

Related Questions