select repeated row observations with the least absolute difference

Question

I have a data frame like this:

df <- data.frame(id = c(1,1,1,2,2,3,3,3,3),
                  vars = c(1,2,5, 1,3, 0,2,4,-1))

> df
  id vars
1  1    1
2  1    2
3  1    5
4  2    1
5  2    3
6  3    0
7  3    2
8  3    4
9  3   -1

In this data frame each id can have several observations. I now want to select for each id the pair (2 observations) that have the least absolute difference for vars.

In the above case that would be

for id 1, values 1 and 2 have the lowest absolute difference, id 2 only has 2 observations so both are automatically selected. for the id 3 the selected vars would be 0 and -1 because the absolute difference is 1, lower than all other combinations.

IceCreamToucan · Accepted Answer

You don't need to do all the comparisons (or, you can let arrange do your comparisons for you), because once you've sorted the values each value is already beside the value for which the difference is minimized.

df %>% 
  group_by(id) %>% 
  arrange(vars) %>% 
  slice(which.min(diff(vars)) + 0:1)

# # A tibble: 6 x 2
# # Groups:   id [3]
#      id  vars
#    
# 1     1     1
# 2     1     2
# 3     2     1
# 4     2     3
# 5     3    -1
# 6     3     0

data.table version

library(data.table)
setDT(df)

df[df[order(vars), .I[which.min(diff(vars)) + 0:1], id]$V1]

#    id vars
# 1:  3   -1
# 2:  3    0
# 3:  1    1
# 4:  1    2
# 5:  2    1
# 6:  2    3

select repeated row observations with the least absolute difference

Answers (2)

Related Questions