Reputation: 560
In the dataset below, how could I create a new column min.diff
that reports, for a given observation x
, the minimum distance between x
and any other observation y
within its group (identified by the group
column)? I would like to measure the distance between x
and y
by abs(x-y)
.
set.seed(1)
df <- data.frame(
group = c('A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'),
value = sample(1:10, 8, replace = T)
)
Expected output:
group value min.diff
1 A 9 2
2 A 4 3
3 A 7 2
4 B 1 1
5 B 2 1
6 C 7 4
7 C 2 1
8 C 3 1
I prefer a solution using dplyr
.
The only way that I have in my mind is to extend the dataframe by adding more rows to get each possible pair within groups, calculating distances and then filtering out the smallest value in each group. Is there a more compact way?
Upvotes: 0
Views: 642
Reputation: 685
If the order doesn't matter...
library(dplyr)
df %>%
arrange(group, value) %>% #Order ascending by value, within each group
group_by(group) %>%
mutate(min.diff = case_when(lag(group) == group & lead(group) == group ~ min(c(abs(value - lag(value)), abs(value - lead(value))), na.rm = T), #If the "group" for the previous and next entry are the same as the current group, take the smallest of the two differences
lag(group) == group ~ abs(value - lag(value)), #Otherwise, if only the previous entry's group is the same as the current one, take the difference from the previous
lead(group) == group ~ abs(value - lead(value)) #Otherwise, if only the next entry's group is the same as the current one, take the difference from the next
)
) %>%
ungroup()
# group value min.diff
# <chr> <int> <int>
# 1 A 4 3
# 2 A 7 2
# 3 A 9 2
# 4 B 1 1
# 5 B 2 1
# 6 C 2 1
# 7 C 3 1
# 8 C 7 4
If the order is important, you could add in an index and rearrange it after, like so:
library(dplyr)
df %>%
group_by(group) %>%
mutate(index = row_number()) %>% #create the index
arrange(group, value) %>%
mutate(min.diff = case_when(lag(group) == group & lead(group) == group ~ min(c(abs(value - lag(value)), abs(value - lead(value))), na.rm = T),
lag(group) == group ~ abs(value - lag(value)),
lead(group) == group ~ abs(value - lead(value))
)
) %>%
ungroup() %>%
arrange(group, index) %>% #rearrange by the index
select(-index) #remove the index
# group value min.diff
# <chr> <int> <int>
# 1 A 9 2
# 2 A 4 3
# 3 A 7 2
# 4 B 1 1
# 5 B 2 1
# 6 C 7 4
# 7 C 2 1
# 8 C 3 1
Upvotes: 0
Reputation: 388907
We can use map_dbl
to subtract current value with all other values and select the minimum from it for each group
.
library(dplyr)
library(purrr)
df %>%
group_by(group) %>%
mutate(min.diff = map_dbl(row_number(), ~min(abs(value[-.x] - value[.x]))))
# group value min.diff
# <chr> <int> <dbl>
#1 A 9 2
#2 A 4 3
#3 A 7 2
#4 B 1 1
#5 B 2 1
#6 C 7 4
#7 C 2 1
#8 C 3 1
Upvotes: 1
Reputation: 887048
We can use combn
to do the pairwise difference between 'value', get the min
of the abs
olute values
library(dplyr)
df1 <- df %>%
mutate(new = min(abs(combn(value, 2, FUN = function(x) x[1] - x[2]))))
If we want to get the min
imum between a given element i.e. first
from the rest
df1 <- df %>%
mutate(new = min(abs(value[-1] - first(value))))
Upvotes: 1