Reputation: 560

Computing minimum distance between observations within groups

In the dataset below, how could I create a new column min.diff that reports, for a given observation x, the minimum distance between x and any other observation y within its group (identified by the group column)? I would like to measure the distance between x and y by abs(x-y).

    set.seed(1)

df <- data.frame(
  group = c('A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'),
  value = sample(1:10, 8, replace = T)
)

Expected output:

  group value min.diff
1     A     9   2
2     A     4   3
3     A     7   2
4     B     1   1
5     B     2   1
6     C     7   4
7     C     2   1
8     C     3   1

I prefer a solution using dplyr. The only way that I have in my mind is to extend the dataframe by adding more rows to get each possible pair within groups, calculating distances and then filtering out the smallest value in each group. Is there a more compact way?

Upvotes: 0

Answers (3)

R me matey

Reputation: 685

If the order doesn't matter...

library(dplyr)

df %>% 
  arrange(group, value) %>% #Order ascending by value, within each group
  group_by(group) %>% 
  mutate(min.diff = case_when(lag(group) == group & lead(group) == group ~ min(c(abs(value - lag(value)), abs(value - lead(value))), na.rm = T), #If the "group" for the previous and next entry are the same as the current group, take the smallest of the two differences
                              lag(group) == group ~ abs(value - lag(value)), #Otherwise, if only the previous entry's group is the same as the current one, take the difference from the previous
                              lead(group) == group ~ abs(value - lead(value)) #Otherwise, if only the next entry's group is the same as the current one, take the difference from the next
                              )
         ) %>%
  ungroup()

  #    group value min.diff
  #    <chr> <int>    <int>
  #  1 A         4        3
  #  2 A         7        2
  #  3 A         9        2
  #  4 B         1        1
  #  5 B         2        1
  #  6 C         2        1
  #  7 C         3        1
  #  8 C         7        4

If the order is important, you could add in an index and rearrange it after, like so:

library(dplyr)

df %>% 
  group_by(group) %>%
  mutate(index = row_number()) %>% #create the index
  arrange(group, value) %>%
  mutate(min.diff = case_when(lag(group) == group & lead(group) == group ~ min(c(abs(value - lag(value)), abs(value - lead(value))), na.rm = T),
                              lag(group) == group ~ abs(value - lag(value)),
                              lead(group) == group ~ abs(value - lead(value))
                              )
         ) %>%
  ungroup() %>%
  arrange(group, index) %>% #rearrange by the index
  select(-index) #remove the index


#   group value min.diff
#   <chr> <int>    <int>
# 1 A         9        2
# 2 A         4        3
# 3 A         7        2
# 4 B         1        1
# 5 B         2        1
# 6 C         7        4
# 7 C         2        1
# 8 C         3        1

Upvotes: 0

Ronak Shah

Reputation: 389355

We can use map_dbl to subtract current value with all other values and select the minimum from it for each group.

library(dplyr)
library(purrr)

df %>%
  group_by(group) %>%
  mutate(min.diff = map_dbl(row_number(), ~min(abs(value[-.x] - value[.x]))))
       

#  group value min.diff
#  <chr> <int>    <dbl>
#1 A         9        2
#2 A         4        3
#3 A         7        2
#4 B         1        1
#5 B         2        1
#6 C         7        4
#7 C         2        1
#8 C         3        1

Upvotes: 1

akrun

Reputation: 887991

We can use combn to do the pairwise difference between 'value', get the min of the absolute values

library(dplyr)
df1 <- df %>% 
          mutate(new = min(abs(combn(value, 2, FUN = function(x) x[1] - x[2]))))

If we want to get the minimum between a given element i.e. first from the rest

 df1 <- df %>%
            mutate(new = min(abs(value[-1] - first(value))))

Upvotes: 1

Computing minimum distance between observations within groups

Answers (3)

Related Questions