Reputation: 9485
I'm working with Euclidean Distance with a pair of dataset. First of all, my data.
centers <- data.frame(x_ce = c(300,180,450,500),
y_ce = c(23,15,10,20),
center = c('a','b','c','d'))
points <- data.frame(point = c('p1','p2','p3','p4'),
x_p = c(160,600,400,245),
y_p = c(7,23,56,12))
My goal is to find, for each point in points
, the smallest distance from all the center in centers
, and append the center name to the points
dataset (clearly the smallest one's), and make this procedure automatic.
So I started with the base:
#Euclidean distance
sqrt(sum((x-y)^2))
The fact that I have in my mind how it should work, but I cannot manage how to make it automatic.
points
, and all the rows of centers
centers
points
So I managed to do it manually, to have all the steps to make it automatic:
# 1.
x = (points[1,2:3]) # select the first of points
y1 = (centers[1,1:2]) # select the first center
y2 = (centers[2,1:2]) # select the second center
y3 = (centers[3,1:2]) # select the third center
y4 = (centers[4,1:2]) # select the fourth center
# 2.
# then the distances
distances <- data.frame(distance = c(
sqrt(sum((x-y1)^2)),
sqrt(sum((x-y2)^2)),
sqrt(sum((x-y3)^2)),
sqrt(sum((x-y4)^2))),
center = centers$center
)
# 3.
# then I choose the row with the smallest distance
d <- distances[which(distances$distance==min(distances$distance)),]
# 4.
# last, I put the label near the point
cbind(points[1,],d)
# 5.
# then I restart for the second point
The problem is that I cannot manage it automatically. have you got any idea to make this procedure automatic for each points of points
?
Furthermore, am I reinventing the wheel, i.e. does it exist a faster procedure (as a function) that I don't know?
Upvotes: 3
Views: 1079
Reputation: 5530
With the dplyr
package, you can use group_by
to loop over each point and mutate
to form a list of distances, set distance
as the min of the list, and set center
as the name of the minimum distance center. I've included two alternatives for the cases of duplicate rows or point names.
library(dplyr)
centers <- data.frame(x_ce = c(300,180,450,500),
y_ce = c(23,15,10,20),
center = c('a','b','c','d'))
points <- data.frame(point = c('p1','p2','p3','p4', "p4"),
x_p = c(160,600,400,245, 245),
y_p = c(7,23,56,12, 12))
#
# If duplicate rows need to be removed
#
result1 <- points %>% group_by(point) %>% distinct() %>%
mutate(lst = with(centers, list(sqrt( (x_p-x_ce)^2 + (y_p-y_ce)^2 ) ) ),
distance=min(unlist(lst)),
center = centers$center[which.min(unlist(lst))]) %>%
select(-lst)
which gives the result
# A tibble: 4 x 5
# Groups: point [4]
point x_p y_p distance center
<fct> <dbl> <dbl> <dbl> <fct>
1 p1 160 7 21.5 b
2 p2 600 23 100. d
3 p3 400 56 67.9 c
4 p4 245 12 56.1 a
and
#
# Alternative if point names are not unique
#
points <- data.frame(point = c('p1','p2','p3','p4', "p4"),
x_p = c(160,600,400,245, 550),
y_p = c(7,23,56,12, 25))
result2 <- points %>% rowwise() %>%
mutate( lst = with(centers, list(sqrt( (x_p-x_ce)^2 + (y_p-y_ce)^2 ) ) ),
distance=min(unlist(lst)),
center = centers$center[which.min(unlist(lst))]) %>%
ungroup() %>% select(-lst)
with the result
# A tibble: 5 x 5
point x_p y_p distance center
<fct> <dbl> <dbl> <dbl> <fct>
1 p1 160 7 21.5 b
2 p2 600 23 100. d
3 p3 400 56 67.9 c
4 p4 245 12 56.1 a
5 p4 550 25 50.2 d
Upvotes: 2
Reputation: 16121
centers <- data.frame(x_ce = c(300,180,450,500),
y_ce = c(23,15,10,20),
center = c('a','b','c','d'))
points <- data.frame(point = c('p1','p2','p3','p4'),
x_p = c(160,600,400,245),
y_p = c(7,23,56,12))
library(tidyverse)
points %>%
mutate(c = list(centers)) %>%
unnest() %>% # create all posible combinations of points and centers as a dataframe
rowwise() %>% # for each combination
mutate(d = sqrt(sum((c(x_p,y_p)-c(x_ce,y_ce))^2))) %>% # calculate distance
ungroup() %>% # forget the grouping
group_by(point, x_p, y_p) %>% # for each point
summarise(closest_center = center[d == min(d)]) %>% # keep the closest center
ungroup() # forget the grouping
# # A tibble: 4 x 4
# point x_p y_p closest_center
# <fct> <dbl> <dbl> <fct>
# 1 p1 160 7 b
# 2 p2 600 23 d
# 3 p3 400 56 c
# 4 p4 245 12 a
Upvotes: 2