jameshowison
jameshowison

Reputation: 181

Using dplyr n_distinct in function with quoted variable

I'm trying to use dplyr within a function, passing in a column name as a variable to then be used with n_distinct in the summarize function.

I understand that programming with dplyr has become easier, with the summarize_, arrange_ etc functions, as described in vignette(nse). I've tried various combinations of interp from lazyeval as well. n_distinct responses with "Input to n_distinct() must be a single variable name from the data set" (which makes sense, it's just that I have the variable name in a string ...)

This works fine outside a function (mention is a column name in the data.frame):

summarize(data, count=n_distinct(mention))

This was my first effort:

getProportions <- function(datain, id_column) {
    overall_total <- summarize(datain, count=n_distinct(id_column))[1,1]
}

getProportions(measures, "mention")

And after reading the NSE documentation and some threads on here about programming with dplyr I tried:

overall_total <- summarize_(datain, count=interp(~n_distinct(var),var=as.name(id_column)))[1,1]

but to no avail. Any ideas? Almost seems like n_distinct_() is needed?

Edit My apologies and thanks. You are right, the interp version does work, it seems that I never quite hit that full combination. I looked over my old versions and when I have the var part right I was using plain summarize() and when I used summarize_() I left off the var= part of the interp call. Sigh. My fault for not producing a full working example with both versions.

Upvotes: 4

Views: 2565

Answers (1)

jameshowison
jameshowison

Reputation: 181

As indicated in the comments, the right way to do this was my second option, which apparently I had never quite tested (i'd left of the var = part of the interp call.):

f <- function(data, col) {
        summarise_(data, count = interp(~n_distinct(var), var = as.name(col)))
}
f(mtcars, "cyl")

Upvotes: 3

Related Questions