Reputation: 651
Suppose I have dataframe as follows,
a = c(10,20,30,40,50, 60, 70, 80 ,90, 100) %>% data.frame()
colnames(a) = c("column1")
and a vector,
b = c( 46, 90, 75, 15)
I want to find the closest element of b from a. The required output would be,
a b
10 15
20 15
30 15
40 46
50 46
60 46
70 75
80 75
90 90
100 90
The following is my trying,
I am trying to add rownames to a and b and trying to create full join, and finding the difference for every combination and take the minimum difference. But adding rownames, makes the full join work only for the first four elements,
a %>% add_rownames('rowname') %>% full_join(b %>% add_rownames(rowname), by = c("rowname" = "rowname"))
This doesn't work. Can anybody help me in solving this problem?
Upvotes: 2
Views: 1438
Reputation: 43354
One option is to use outer
with -
to subtract all combinations of elements from each vector, producing a matrix. Rearranging to find the negative absolute value of that matrix lets you use max.col
to find which index of b
has the smallest difference. Subsetting b
returns that value, so
a$b <- b[max.col(-abs(outer(a$column1, b, `-`)))]
returns
a
## column1 b
## 1 10 15
## 2 20 15
## 3 30 15
## 4 40 46
## 5 50 46
## 6 60 46
## 7 70 75
## 8 80 75
## 9 90 90
## 10 100 90
You could equally work element-wise, if you prefer. In dplyr, grouping rowwise
makes such an approach fairly straightforward:
library(dplyr)
a %>% rowwise() %>% mutate(b = b[which.min(abs(column1 - b))])
## Source: local data frame [10 x 2]
## Groups: <by row>
##
## # A tibble: 10 × 2
## column1 b
## <dbl> <dbl>
## 1 10 15
## 2 20 15
## 3 30 15
## 4 40 46
## 5 50 46
## 6 60 46
## 7 70 75
## 8 80 75
## 9 90 90
## 10 100 90
Upvotes: 1