using variable column names in dplyr (do)

Question

I have the following example data

d.1 = data.frame(id=c(1,1,2,3,3), date=c(2001,2002,2001,2001,2003), measure=c(1:5))
d.2 = data.frame(id=c(1,2,2,3,3), date=c(2001,2002,2003,2002,2008), measure=c(1:5))
d = merge(d.1,d.2, all=T, by="id")

d.1 and d.2 are two kinds of measurements and I need one of each measurements per id. The measurements should be as close to each other as possible. I can do that with dplyr by

require(dplyr)
d = d %>%
    group_by(id) %>%
    do(.[which.min(abs(.$date.x-.$date.y)),])

The question is how i can use dplyr if the names of the date columns are saved in a variable like name.x="date.x" and name.y="date.y" because i can't use

...
do(.[which.min(abs(.[, name.x]-.[, name.y])),])
....

I tried to find anaother solution using eval, as.symbol ans stuff like that but i couldn't figure out a solution...

konvas · Accepted Answer

d$date.x returns a vector while d[, name.x] returns a data.frame, which does not work when passed inside your function. So simply change the way you access this column to d[[name.x]] and it will work:

d %>% group_by(id) %>% do(.[which.min(abs(.[[name.x]] -.[[name.y]])),])

using variable column names in dplyr (do)

Answers (2)

Related Questions