tchakravarty
tchakravarty

Reputation: 10954

dplyr: standard evaluation for mutate with quoted variable names

How would I go about using mutate (my presumption is that I am looking for standard evaluation in my case, and hence mutate_, but I am not entirely confident on this point) when using a function that accepts a list of variable names, such as this:

createSum = function(data, variableNames) {
  data %>% 
    mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                            var = as.name(paste(as.character(variableNames), collapse =","))))

}

Here is an MWE that strips the function to its core logic and demonstrates what I am trying to achieve:

library(dplyr)
library(lazyeval)

# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
  liSample = lapply(colNames, function(week) {
    sample = rnorm(sampleSize)
  })
  names(liSample) = as.character(colNames)
  return(tbl_df(data.frame(liSample, check.names = FALSE)))
}

# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
                     to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)

# test mutate on this table
dfTest %>% 
  mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                          var = as.name(paste(as.character(weekDates), collapse =","))))

Expected output here is what would be returned by:

rowSums(dfTest[, as.character(weekDates)])

Upvotes: 6

Views: 3151

Answers (2)

eddi
eddi

Reputation: 49448

I don't know if this is an "officially sanctioned" dplyr way, but this is a possibility:

weekDates = as.character(weekDates) # more convenient

dfTest %>% mutate(sumvar = Reduce(`+`, lapply(weekDates, get, .)))
#or
dfTest %>% mutate(sumvar = rowSums(as.data.frame(lapply(weekDates, get, .))))

This does carry potentially significant performance penalties, depending on your particular usage - in addition to dplyr's regular copying of the entire data I think it also copies it a second time during that internal computation. You can look into data.table to avoid the extra copying around by adding columns in place (and using .SDcols to avoid the second copy) + you'll get arguably better syntax.

Upvotes: 1

MrFlick
MrFlick

Reputation: 206197

I think this is what you're after

createSum = function(data, variableNames) {
    data %>% 
        mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)

where we just supply a character value rather than interp because you can't pass in a list of names as a single parameter to a function. Plus, sum() would do some undesired collapsing because operations are not performed rowwise, they are passed in columns of vectors at a time.

The other problem with this example is that you set check.names=FALSE in your data.frame which means that you've created column names that cannot be valid symbols. You can explicitly wrap your variable names in back-ticks if you like

createSum(dfTest , paste0("`", weekDates,"`"))

but in general it would be better not to use invalid names.

Upvotes: 5

Related Questions