Reputation: 1580

How to use a character vector of column names in the formula argument of dcast (reshape2)

Say I have a dataframe df with dozens of identifying variables (in columns) and only a few measured variables (also in columns).

To avoid repetitively typing all the variables for each argument, I assign the names of the identifying and measured df columns to df_id and df_measured, respectively. It's easy enough to input these vectors to shorten the argument inputs for melt...

df.m  <- melt(df, id.vars = df_id, measure.vars = df_measured)

... but I'm at a loss for how to enter the formula = argument in dcast using the same method to specify my id variables since it requires that the input point to numeric positions of the columns.

Do I have to make a vector of numeric positions similar to df_id and risk broken functionality of my program if my input columns change in order, or can I refer to them by name and somehow still get that to work in the formula = argument? Thanks.

Upvotes: 6

Answers (3)

DuckPyjamas

Reputation: 1659

For people using base R, sprintf() is fine compared to glue::glue():

vars_to_use <- c("Petal.Length", "Sepal.Length")

as.formula(sprintf("Species ~ %s", 
                   paste(vars_to_use, collapse = " + ")))

## Species ~ Petal.Length + Sepal.Length

As a bonus, if you can ever use sprintf() instead of using paste() at all, you'll get a performance improvement since it's implemented in C.

Upvotes: 0

Phil van Kleur

Reputation: 266

The function glue() , exported from the Tidyverse package glue , makes the formula easier to build than with paste() . Here's what glue() does:

a <- 1
b <- 2
glue( "{a} + {b} = {a+b}." )

returns the string

1 + 2 = 3.

So glue() takes its argument verbatim, but substitutes names and other expressions in curly brackets. See the link above for the full spec: glue() has other arguments, including more strings, an argument that gives the environment in which to look up variables, and two arguments that change the curly brackets to other delimiters. As far as dcast() is concerned, it avoids the extra quotes and commas that you have to use with paste() . Here's an example, using your table:

install.packages( "glue" )
library( glue )

library( data.table ) 

dt <- data.table( c1 = c( 1  , 1  , 1  , 2   , 2   , 2    )    
                , c2 = c( "A", "B", "C", "A1", "B1", "C1" )
                , c3 = c( 1  , 2  , 3  , 1   , 2   , 3    )
                )

f1 <- function( d, col_name1, col_name2, col_name3 ) {
  dcast( d, glue( "{col_name1} ~ {col_name3}" ), value.var = col_name2 )
}

f1( dt, "c1", "c2", "c3" )

And here's its output (on R 3.6.3):

> f1( dt, "c1", "c2", "c3" )
   c1  1  2  3
1:  1  A  B  C
2:  2 A1 B1 C1

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1

Reputation: 193547

You can use as.formula to construct a formula.

Here's an example:

library(reshape2)
## Example from `melt.data.frame`
names(airquality) <- tolower(names(airquality))
df_id <- c("month", "day")
aq <- melt(airquality, id = df_id)

## Constructing the formula
f <- as.formula(paste(paste(df_id, collapse = " + "), "~ variable"))

## Applying it....
dcast(aq, f, value.var = "value", fun.aggregate = mean)

Upvotes: 13

How to use a character vector of column names in the formula argument of dcast (reshape2)

Answers (3)

Related Questions