Reputation: 779
Given a data.frame and a vector only with -1,0,1 with length equal to the number of columns of the data.frame. Is there a natural way to transform the vector into a formula with those elements in position with a -1 appear on the left side of the formula and those with +1 appear on the right side?
For example, given the following data.frame
df = data.frame(
'a' = rnorm(10),
'b' = rnorm(10),
'c' = rnorm(10),
'd' = rnorm(10),
'e' = rnorm(10))
and following vector vec = c(-1,-1,0,1,1)
.
Is there a natural way to build formula a+b~d+e
?
Upvotes: 0
Views: 810
Reputation: 270298
We assume that if there are no 1's in vec
that we should use a right hand side of 1 and if there are no -1's in vec
then the left hand side is empty.
The alternatives each produce a character string but if a formula class object is wanted use formula(s)
where s
is that string.
1) paste each side Subset out the names corresponding to vec -1 giving LHS and paste/collapse them and do the same with vec 1 giving RHS and paste those with ~ all together. If we knew that there were at least one 1 in vec
we could omit the if
statement. Of the solutions here this seems the most straightforward.
nms <- names(df)
LHS <- paste(nms[vec == -1], collapse = "+")
RHS <- paste(nms[vec == 1], collapse = "+")
if (RHS == "") RHS <- "1"
paste0(LHS, "~", RHS)
## [1] "a+b~d+e"
2) sapply Alternately combine the LHS and RHS lines into a single sapply
. If we knew that there were at least one 1 in vec
then we could
simplify the code by omitting the if
statement. This approach is shorter than (1).
sa <- sapply(c(-1, 1), function(x) paste(names(df)[vec == x], collapse = "+"))
if (sa[2] == "") sa[2] <- "1"
paste0(sa[1], "~", sa[2])
## [1] "a+b~d+e"
3) tapply We can alternately combine the LHS and RHS lines into a single tapply
like this:
ta <- tapply(names(df), vec, paste, collapse = "+")
paste0(if (any(vec == -1)) ta[["-1"]], "~", if (any(vec == 1)) ta[["1"]] else 1)
## [1] "a+b~d+e"
If we knew that -1 and 1 each appear at least once in vec
then we can simplify the last line to:
paste0(ta[["-1"]], "~", ta[["1"]]])
## [1] "a+b~d+e"
Overall this approach is the shortest if we can guarantee that there will be at least one 1 and at least one -1 but otherwise handling the edge cases seems somewhat cumbersome compared to the other approaches.
Upvotes: 1
Reputation: 887951
We could do this by creating a group by paste
paste(aggregate(nm ~ vec, subset(data.frame(nm = names(df), vec,
stringsAsFactors = FALSE), vec != 0),
FUN = paste, collapse= ' + ')[['nm']], collapse=' ~ ')
#[1] "a + b ~ d + e"
Or another option is tapply
paste(tapply(names(df), vec, FUN = paste,
collapse= ' + ')[c('-1', '1')], collapse= ' ~ ')
#[1] "a + b ~ d + e"
Upvotes: 0