Reputation: 3640
I have a data frame and vector like this:
df1 <- data.frame(orig = c(1,1,1,2,2,2,2,3,3),
proxy = c(1,43,65,2,44,45,46,3,55),
dist = c(0, 100,101, 10, 1000, 5000, 5001,0,3))
v <- c(1,45:100)
I now want the following:
For each unique value in df1$orig
(here it's a numeric for simplicity, but it could be character too), if the same orig
value is not available in v
, find the best proxy that has the lowest dist
.
In this example the first value in df1$orig
is 1 and this value is available in v
as well, so we take it.
The second unique value in df$orig
is 2 and this is not available in v
. The best proxy with the lowest dist
is 44 in this case, but it is not in v
either. The next best is 45 and this value is in v
so we take it.
The third unique value in df1$orig
is 3 and there is no 3 in v
. The best proxy here is 55.
the solution is c(1,45,55)
Note that the first value for each orig
in proxy
is the orig
value.
dist
is sorted here but not necessarily the case always.
Upvotes: 2
Views: 68
Reputation: 39657
In case you are beside a dplyr solution also interested in a base solution.
Fist reduce to those which have a match between proxy
and v
, then order
by orig
and dist
and then take those which are not duplicated
.
tt <- df1[df1$proxy %in% v,]
tt <- tt[order(tt$orig, tt$dist),]
tt[!duplicated(tt$orig),]
# orig proxy dist
#1 1 1 0
#6 2 45 5000
#9 3 55 3
Or in case you losse some orig
when there is not match between proxy
and v
you can use:
tt <- df1[df1$proxy %in% v,]
tt <- tt[order(tt$orig, tt$dist),]
tt <- tt[!duplicated(tt$orig),c("orig", "proxy")]
tt$proxy[match(unique(df1$orig), tt$orig)]
#[1] 1 45 55
Upvotes: 1
Reputation: 11878
This can be done in a couple of steps with {dplyr}: keep the proxies that are in v
, sort by dist
and pick the first for each orig
:
library(dplyr)
df1 %>%
filter(proxy %in% v) %>%
arrange(dist) %>%
group_by(orig) %>%
slice(1)
#> # A tibble: 3 x 3
#> # Groups: orig [3]
#> orig proxy dist
#> <dbl> <dbl> <dbl>
#> 1 1 1 0
#> 2 2 45 5000
#> 3 3 55 3
Created on 2019-09-11 by the reprex package (v0.3.0)
Upvotes: 3