Reputation: 3640
I have a data frame like this:
df <- data.frame(id = c(1,1,1,2,2,3,3,3,3),
vars = c(1,2,5, 1,3, 0,2,4,-1))
> df
id vars
1 1 1
2 1 2
3 1 5
4 2 1
5 2 3
6 3 0
7 3 2
8 3 4
9 3 -1
In this data frame each id
can have several observations.
I now want to select for each id
the pair (2 observations) that have the least absolute difference for vars
.
In the above case that would be
id vars
1 1 1
2 1 2
3 2 1
4 2 3
5 3 0
6 3 -1
for id
1, values 1 and 2 have the lowest absolute difference,
id
2 only has 2 observations so both are automatically selected.
for the id
3 the selected vars would be 0 and -1 because the absolute difference is 1, lower than all other combinations.
Upvotes: 4
Views: 125
Reputation: 28695
You don't need to do all the comparisons (or, you can let arrange
do your comparisons for you), because once you've sorted the values each value is already beside the value for which the difference is minimized.
df %>%
group_by(id) %>%
arrange(vars) %>%
slice(which.min(diff(vars)) + 0:1)
# # A tibble: 6 x 2
# # Groups: id [3]
# id vars
# <dbl> <dbl>
# 1 1 1
# 2 1 2
# 3 2 1
# 4 2 3
# 5 3 -1
# 6 3 0
data.table version
library(data.table)
setDT(df)
df[df[order(vars), .I[which.min(diff(vars)) + 0:1], id]$V1]
# id vars
# 1: 3 -1
# 2: 3 0
# 3: 1 1
# 4: 1 2
# 5: 2 1
# 6: 2 3
Upvotes: 7
Reputation: 2467
Not the most concise but works. Probably somebody can improve the idea.
df1%>%group_by(id)%>%mutate(vars2=which.min(abs(diff(combn(num(vars),2)))))%>%
mutate(vars1=ifelse(vars%in%combn(num(vars),2)[,vars2],vars,NA))%>%select(id,vars1)%>%.[complete.cases(.),]
# A tibble: 6 x 2
# Groups: id [3]
id vars1
<dbl> <dbl>
1 1 1
2 1 2
3 2 1
4 2 3
5 3 0
6 3 -1
The main idea is to do the difference on all the possible combinations of the values of each group. vars2
keeps the column with the lowest difference. If the value is one of the two present in the vars2
column, it is kept. Else, it is set as NA
. Then, only complete cases are returned.
Upvotes: 1