Reputation: 337
I'm looking for a way to connect some points using ggplot in R. I want to connect each point to the nearest point. Here's what my data look like as a scatter plot.
x <- c(0.81,0.82,0.82,0.82,0.83,0.83,0.83,0.84,0.84,0.84,0.85,0.85,0.85,0.86,0.86,0.86,0.87,0.87,0.87,0.88,0.88,0.88,0.89,0.89,0.89,0.9,0.9,0.9,0.91,0.91,0.91,0.92,0.92,0.92,0.93,0.93,0.93,0.93,0.93,0.94,0.94,0.94,0.94,0.94,0.95,0.95,0.95,0.95,0.95,0.96,0.96,0.96,0.96,0.96,0.97,0.97,0.97,0.97,0.97,0.98,0.98,0.98,0.98,0.98,0.99,0.99,0.99,0.99,1,1,1,1,1.01,1.01,1.01,1.01,1.02,1.02,1.02,1.02,1.03,1.03,1.03,1.03,1.04,1.04,1.04,1.04,1.05,1.05,1.05,1.05,1.06,1.06,1.06,1.06,1.07,1.07,1.07,1.07,1.08,1.08,1.08,1.08,1.09,1.09,1.09,1.09,1.1,1.1,1.1,1.1,1.11,1.11,1.11,1.11,1.12,1.12,1.12,1.12,1.13,1.13,1.13,1.13,1.14,1.14,1.15,1.15,1.16,1.16,1.17,1.17,1.18,1.18,1.19,1.19,1.2,1.2,1.21,1.21,1.22,1.22,1.23,1.23,1.24,1.24,1.25,1.25,1.26,1.26,1.27)
y <- c(-1.295,-0.535,-1.575,-1.295,-0.525,-1.575,-1.295,-0.515,-1.575,-1.285,-0.515,-1.575,-1.285,-0.505,-1.575,-1.275,-0.495,-1.575,-1.275,-0.485,-1.575,-1.265,-0.485,-1.575,-1.265,-0.475,-1.575,-1.255,-0.465,-1.575,-1.255,-0.455,-1.575,-1.245,-0.445,1.285,1.545,-1.575,-1.245,-0.435,1.165,1.675,-1.575,-1.235,-0.425,1.085,1.765,-1.575,-1.235,-0.405,1.015,1.845,-1.575,-1.225,-0.395,0.965,1.905,-1.575,-1.215,-0.385,0.915,1.965,-1.575,-1.215,-0.375,0.865,-1.575,-1.205,-0.355,0.825,-1.575,-1.205,-0.345,0.785,-1.565,-1.195,-0.325,0.745,-1.565,-1.185,-0.305,0.705,-1.565,-1.185,-0.285,0.665,-1.565,-1.175,-0.265,0.625,-1.565,-1.165,-0.245,0.585,-1.565,-1.165,-0.225,0.545,-1.565,-1.155,-0.195,0.495,-1.555,-1.145,-0.165,0.455,-1.555,-1.145,-0.135,0.405,-1.555,-1.135,-0.0849999999999999,0.345,-1.555,-1.125,-0.035,0.275,-1.545,-1.115,0.0850000000000001,0.145,-1.545,-1.115,-1.545,-1.105,-1.545,-1.095,-1.535,-1.085,-1.535,-1.085,-1.535,-1.075,-1.525,-1.065,-1.525,-1.055,-1.525,-1.045,-1.515,-1.045,-1.515,-1.035,-1.505,-1.025,-1.505,-1.015,-1.495,-1.005,-1.495)
example_df <- tibble(x = x, y = y)
ggplot(example_df, aes(x = x, y = y)) +
geom_point()
The default behavior of geom_line is to connect coordinates according to the order in which they appear in the dataframe. Is there an easy way to connect points according to Euclidean distance between points?
Upvotes: 0
Views: 1335
Reputation: 16871
This differs from Andrew Gustar's cut
-based answer just in how to separate the 3 paths. I wanted it to be a little more of a scalable process, so I tried using hierarchical clustering to put the points into 3 clusters based on their distances between one another. In this case, they were easily separable; with other data, it might be more tricky and you might need different clustering algorithms. Then based on the other answer (+1 to them), arrange each cluster by y-value to get paths to draw in the right order.
library(dplyr)
library(ggplot2)
example_df <- tibble(x = x, y = y)
clust <- hclust(dist(example_df), method = "single")
df_clustered <- example_df %>%
mutate(cluster = as.factor(cutree(clust, k = 3))) %>%
arrange(cluster, y)
ggplot(df_clustered, aes(x = x, y = y, color = cluster)) +
geom_point() +
geom_path()
Upvotes: 1
Reputation: 18435
Another answer - that will work for this data, but not in general
example_df$group <- cut(example_df$y,
breaks = c(Inf, -0.8, -1.4, -Inf)) #breaks determined 'by eye'
example_df <- example_df[order(example_df$y), ] #sort by y
ggplot(example_df, aes(x = x, y = y, group = group)) +
geom_point() +
geom_path(colour = "blue")
Upvotes: 2
Reputation: 18435
Here is a solution to the question you asked, although I suspect it is not quite what you actually wanted, but it might help...
distmat <- as.matrix(dist(example_df)) #matrix of Euclidean distances between rows
diag(distmat) <- Inf #remove zeros on diagonal
nearest <- apply(distmat, 1, which.min) #find index of nearest point to each point
example_df$xend <- example_df$x[nearest] #set end point of segment from each point
example_df$yend <- example_df$y[nearest]
ggplot(example_df, aes(x = x, y = y, xend = xend, yend = yend)) +
geom_point() +
geom_segment(colour = "blue")
Upvotes: 3