Reputation: 10954
How would I go about using mutate
(my presumption is that I am looking for standard evaluation in my case, and hence mutate_
, but I am not entirely confident on this point) when using a function that accepts a list of variable names, such as this:
createSum = function(data, variableNames) {
data %>%
mutate_(sumvar = interp(~ sum(var, na.rm = TRUE),
var = as.name(paste(as.character(variableNames), collapse =","))))
}
Here is an MWE that strips the function to its core logic and demonstrates what I am trying to achieve:
library(dplyr)
library(lazyeval)
# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
liSample = lapply(colNames, function(week) {
sample = rnorm(sampleSize)
})
names(liSample) = as.character(colNames)
return(tbl_df(data.frame(liSample, check.names = FALSE)))
}
# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)
# test mutate on this table
dfTest %>%
mutate_(sumvar = interp(~ sum(var, na.rm = TRUE),
var = as.name(paste(as.character(weekDates), collapse =","))))
Expected output here is what would be returned by:
rowSums(dfTest[, as.character(weekDates)])
Upvotes: 6
Views: 3151
Reputation: 49448
I don't know if this is an "officially sanctioned" dplyr
way, but this is a possibility:
weekDates = as.character(weekDates) # more convenient
dfTest %>% mutate(sumvar = Reduce(`+`, lapply(weekDates, get, .)))
#or
dfTest %>% mutate(sumvar = rowSums(as.data.frame(lapply(weekDates, get, .))))
This does carry potentially significant performance penalties, depending on your particular usage - in addition to dplyr
's regular copying of the entire data I think it also copies it a second time during that internal computation. You can look into data.table
to avoid the extra copying around by adding columns in place (and using .SDcols
to avoid the second copy) + you'll get arguably better syntax.
Upvotes: 1
Reputation: 206197
I think this is what you're after
createSum = function(data, variableNames) {
data %>%
mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)
where we just supply a character value rather than interp
because you can't pass in a list of names as a single parameter to a function. Plus, sum()
would do some undesired collapsing because operations are not performed rowwise, they are passed in columns of vectors at a time.
The other problem with this example is that you set check.names=FALSE
in your data.frame which means that you've created column names that cannot be valid symbols. You can explicitly wrap your variable names in back-ticks if you like
createSum(dfTest , paste0("`", weekDates,"`"))
but in general it would be better not to use invalid names.
Upvotes: 5