Reputation: 1639
I have the following example data
d.1 = data.frame(id=c(1,1,2,3,3), date=c(2001,2002,2001,2001,2003), measure=c(1:5))
d.2 = data.frame(id=c(1,2,2,3,3), date=c(2001,2002,2003,2002,2008), measure=c(1:5))
d = merge(d.1,d.2, all=T, by="id")
d.1
and d.2
are two kinds of measurements and I need one of each measurements per id
. The measurements should be as close to each other as possible. I can do that with dplyr
by
require(dplyr)
d = d %>%
group_by(id) %>%
do(.[which.min(abs(.$date.x-.$date.y)),])
The question is how i can use dplyr
if the names of the date columns are saved in a variable like name.x="date.x"
and name.y="date.y"
because i can't use
...
do(.[which.min(abs(.[, name.x]-.[, name.y])),])
....
I tried to find anaother solution using eval
, as.symbol
ans stuff like that but i couldn't figure out a solution...
Upvotes: 2
Views: 548
Reputation: 2743
Since 0.4 (which was released just after this question was answered), dplyr
has included standard evaluation version do_
, which in theory should be easier to program with than the NSE version.
You could use it similarly:
interp <- lazyeval::interp
d %>%
group_by(id) %>%
do_(interp(~ .[which.min(abs(.$x - .$y)), ],
x = as.name(name.x), y = as.name(name.y)))
I'm not sure it's any easier to read or write than the NSE version. For the other verbs, code can remain concise while also programmatically accessing names.
For do_
, however, one must use the dot pronoun to access column names e.g. as discussed in this question. As a consequence, I think, you always need to use interp
with do_
. This makes the code more verbose than the NSE version in the earlier answer.
Upvotes: 0
Reputation: 14366
d$date.x
returns a vector while d[, name.x]
returns a data.frame, which does not work when passed inside your function. So simply change the way you access this column to d[[name.x]]
and it will work:
d %>% group_by(id) %>% do(.[which.min(abs(.[[name.x]] -.[[name.y]])),])
Upvotes: 3