Reputation: 1580
Say I have a dataframe df
with dozens of identifying variables (in columns) and only a few measured variables (also in columns).
To avoid repetitively typing all the variables for each argument, I assign the names of the identifying and measured df
columns to df_id
and df_measured
, respectively. It's easy enough to input these vectors to shorten the argument inputs for melt
...
df.m <- melt(df, id.vars = df_id, measure.vars = df_measured)
... but I'm at a loss for how to enter the formula =
argument in dcast
using the same method to specify my id variables since it requires that the input point to numeric positions of the columns.
Do I have to make a vector of numeric positions similar to df_id
and risk broken functionality of my program if my input columns change in order, or can I refer to them by name and somehow still get that to work in the formula =
argument? Thanks.
Upvotes: 6
Views: 3403
Reputation: 1659
For people using base R, sprintf()
is fine compared to glue::glue()
:
vars_to_use <- c("Petal.Length", "Sepal.Length")
as.formula(sprintf("Species ~ %s",
paste(vars_to_use, collapse = " + ")))
## Species ~ Petal.Length + Sepal.Length
As a bonus, if you can ever use sprintf()
instead of using paste()
at all, you'll get a performance improvement since it's implemented in C.
Upvotes: 0
Reputation: 266
The function glue() , exported from the Tidyverse package glue , makes the formula easier to build than with paste() . Here's what glue() does:
a <- 1
b <- 2
glue( "{a} + {b} = {a+b}." )
returns the string
1 + 2 = 3.
So glue() takes its argument verbatim, but substitutes names and other expressions in curly brackets. See the link above for the full spec: glue() has other arguments, including more strings, an argument that gives the environment in which to look up variables, and two arguments that change the curly brackets to other delimiters. As far as dcast() is concerned, it avoids the extra quotes and commas that you have to use with paste() . Here's an example, using your table:
install.packages( "glue" )
library( glue )
library( data.table )
dt <- data.table( c1 = c( 1 , 1 , 1 , 2 , 2 , 2 )
, c2 = c( "A", "B", "C", "A1", "B1", "C1" )
, c3 = c( 1 , 2 , 3 , 1 , 2 , 3 )
)
f1 <- function( d, col_name1, col_name2, col_name3 ) {
dcast( d, glue( "{col_name1} ~ {col_name3}" ), value.var = col_name2 )
}
f1( dt, "c1", "c2", "c3" )
And here's its output (on R 3.6.3):
> f1( dt, "c1", "c2", "c3" )
c1 1 2 3
1: 1 A B C
2: 2 A1 B1 C1
Upvotes: 0
Reputation: 193547
You can use as.formula
to construct a formula.
Here's an example:
library(reshape2)
## Example from `melt.data.frame`
names(airquality) <- tolower(names(airquality))
df_id <- c("month", "day")
aq <- melt(airquality, id = df_id)
## Constructing the formula
f <- as.formula(paste(paste(df_id, collapse = " + "), "~ variable"))
## Applying it....
dcast(aq, f, value.var = "value", fun.aggregate = mean)
Upvotes: 13