Reputation: 181
I'm trying to use dplyr within a function, passing in a column name as a variable to then be used with n_distinct in the summarize function.
I understand that programming with dplyr has become easier, with the summarize_, arrange_ etc functions, as described in vignette(nse). I've tried various combinations of interp from lazyeval as well. n_distinct responses with "Input to n_distinct() must be a single variable name from the data set" (which makes sense, it's just that I have the variable name in a string ...)
This works fine outside a function (mention is a column name in the data.frame):
summarize(data, count=n_distinct(mention))
This was my first effort:
getProportions <- function(datain, id_column) {
overall_total <- summarize(datain, count=n_distinct(id_column))[1,1]
}
getProportions(measures, "mention")
And after reading the NSE documentation and some threads on here about programming with dplyr I tried:
overall_total <- summarize_(datain, count=interp(~n_distinct(var),var=as.name(id_column)))[1,1]
but to no avail. Any ideas? Almost seems like n_distinct_() is needed?
Edit My apologies and thanks. You are right, the interp version does work, it seems that I never quite hit that full combination. I looked over my old versions and when I have the var part right I was using plain summarize() and when I used summarize_() I left off the var= part of the interp call. Sigh. My fault for not producing a full working example with both versions.
Upvotes: 4
Views: 2565
Reputation: 181
As indicated in the comments, the right way to do this was my second option, which apparently I had never quite tested (i'd left of the var = part of the interp call.):
f <- function(data, col) {
summarise_(data, count = interp(~n_distinct(var), var = as.name(col)))
}
f(mtcars, "cyl")
Upvotes: 3