Josh J
Josh J

Reputation: 475

In R: find the closest value within group_by excluding self comparisons

I would like to find the value in a column that is has the absolute lowest difference to the each row in the same column.

I've tried solutions from find value closest to x by group in dplyr and return index from a vector of the value closest to a given element

My code:

library(DescTools)
data %>% 
  select(river, dist, id) %>%
  group_by(river) %>% 
  mutate(NNdist = Closest(dist, dist))

For id = TYWI03 I would expect NNdist = 1690 and id = TAFF04 I would expect NNdist = 1607, but the value returned is the reference value i.e. It's returning a from Closest(x, a).

data is

 river  dist id     NNdist
  <chr> <dbl> <chr>   <dbl>
1 Tywi     34 TYWI03     34
2 Tywi   1690 TYWI02   1690
3 Tywi   1747 TYWI01   1747
4 Taff   1607 TAFF05   1607
5 Taff   4341 TAFF04   4341
6 Taff  12357 TAFF03  12357
7 Taff  16111 TAFF02  16111
8 Taff  18124 TAFF01  18124

Upvotes: 1

Views: 395

Answers (1)

Josh J
Josh J

Reputation: 475

Answered it using a question I asked years ago Count values less than x and find nearest values to x by multiple groups

temp1 <- data%>%
  group_by(river) %>%
  mutate(n_ds = match(dist,sort(dist))-1) %>%
  mutate(closest_uid=apply(sapply(dist, function(i)abs(i-dist)), 2, function(n) id[which(n==sort(n)[2])])) %>%
  data.frame()

tempdist <- temp1 %>% select(dist, id) %>% rename(rivDist = dist)

temp2 <- temp1 %>% left_join(tempdist, by = c('closest_uid' = 'id')) %>%
  mutate(mindist = abs(dist - rivDist)

Upvotes: 1

Related Questions