R, compute the smallest Euclidean Distance for two dataset, and label it automatically

Question

I'm working with Euclidean Distance with a pair of dataset. First of all, my data.

centers <- data.frame(x_ce = c(300,180,450,500),
                      y_ce = c(23,15,10,20),
                      center = c('a','b','c','d'))

points <- data.frame(point = c('p1','p2','p3','p4'),
                     x_p = c(160,600,400,245),
                     y_p = c(7,23,56,12))

My goal is to find, for each point in points, the smallest distance from all the center in centers, and append the center name to the points dataset (clearly the smallest one's), and make this procedure automatic.

So I started with the base:

#Euclidean distance
sqrt(sum((x-y)^2))

The fact that I have in my mind how it should work, but I cannot manage how to make it automatic.

choose one row of points, and all the rows of centers
calculate the Euclidean Distance between the row and each row of centers
choose the smallest distance
attach the label of the smallest distance
repeat for the second row ... till the end of points

So I managed to do it manually, to have all the steps to make it automatic:

# 1.  
x = (points[1,2:3])   # select the first of points
y1 = (centers[1,1:2]) # select the first center
y2 = (centers[2,1:2]) # select the second center
y3 = (centers[3,1:2]) # select the third center
y4 = (centers[4,1:2]) # select the fourth center

# 2.
# then the distances
distances <- data.frame(distance = c(
                                    sqrt(sum((x-y1)^2)),
                                    sqrt(sum((x-y2)^2)),
                                    sqrt(sum((x-y3)^2)),
                                    sqrt(sum((x-y4)^2))),
                                    center = centers$center
                                    )

# 3.
# then I choose the row with the smallest distance
d <- distances[which(distances$distance==min(distances$distance)),]

# 4.
# last, I put the label near the point
cbind(points[1,],d)

# 5. 
# then I restart for the second point

The problem is that I cannot manage it automatically. have you got any idea to make this procedure automatic for each points of points? Furthermore, am I reinventing the wheel, i.e. does it exist a faster procedure (as a function) that I don't know?

AntoniosK · Accepted Answer

centers <- data.frame(x_ce = c(300,180,450,500),
                      y_ce = c(23,15,10,20),
                      center = c('a','b','c','d'))

points <- data.frame(point = c('p1','p2','p3','p4'),
                     x_p = c(160,600,400,245),
                     y_p = c(7,23,56,12))

library(tidyverse)

points %>%
  mutate(c = list(centers)) %>%
  unnest() %>%                       # create all posible combinations of points and centers as a dataframe
  rowwise() %>%                      # for each combination
  mutate(d = sqrt(sum((c(x_p,y_p)-c(x_ce,y_ce))^2))) %>%   # calculate distance
  ungroup() %>%                                            # forget the grouping
  group_by(point, x_p, y_p) %>%                            # for each point
  summarise(closest_center = center[d == min(d)]) %>%      # keep the closest center
  ungroup()                                                # forget the grouping

# # A tibble: 4 x 4
#   point   x_p   y_p closest_center
#               
# 1 p1      160     7 b             
# 2 p2      600    23 d             
# 3 p3      400    56 c             
# 4 p4      245    12 a

R, compute the smallest Euclidean Distance for two dataset, and label it automatically

Answers (2)

Related Questions