big parma
big parma

Reputation: 337

Connecting points with ggplot in R

I'm looking for a way to connect some points using ggplot in R. I want to connect each point to the nearest point. Here's what my data look like as a scatter plot.

x <- c(0.81,0.82,0.82,0.82,0.83,0.83,0.83,0.84,0.84,0.84,0.85,0.85,0.85,0.86,0.86,0.86,0.87,0.87,0.87,0.88,0.88,0.88,0.89,0.89,0.89,0.9,0.9,0.9,0.91,0.91,0.91,0.92,0.92,0.92,0.93,0.93,0.93,0.93,0.93,0.94,0.94,0.94,0.94,0.94,0.95,0.95,0.95,0.95,0.95,0.96,0.96,0.96,0.96,0.96,0.97,0.97,0.97,0.97,0.97,0.98,0.98,0.98,0.98,0.98,0.99,0.99,0.99,0.99,1,1,1,1,1.01,1.01,1.01,1.01,1.02,1.02,1.02,1.02,1.03,1.03,1.03,1.03,1.04,1.04,1.04,1.04,1.05,1.05,1.05,1.05,1.06,1.06,1.06,1.06,1.07,1.07,1.07,1.07,1.08,1.08,1.08,1.08,1.09,1.09,1.09,1.09,1.1,1.1,1.1,1.1,1.11,1.11,1.11,1.11,1.12,1.12,1.12,1.12,1.13,1.13,1.13,1.13,1.14,1.14,1.15,1.15,1.16,1.16,1.17,1.17,1.18,1.18,1.19,1.19,1.2,1.2,1.21,1.21,1.22,1.22,1.23,1.23,1.24,1.24,1.25,1.25,1.26,1.26,1.27)

y <- c(-1.295,-0.535,-1.575,-1.295,-0.525,-1.575,-1.295,-0.515,-1.575,-1.285,-0.515,-1.575,-1.285,-0.505,-1.575,-1.275,-0.495,-1.575,-1.275,-0.485,-1.575,-1.265,-0.485,-1.575,-1.265,-0.475,-1.575,-1.255,-0.465,-1.575,-1.255,-0.455,-1.575,-1.245,-0.445,1.285,1.545,-1.575,-1.245,-0.435,1.165,1.675,-1.575,-1.235,-0.425,1.085,1.765,-1.575,-1.235,-0.405,1.015,1.845,-1.575,-1.225,-0.395,0.965,1.905,-1.575,-1.215,-0.385,0.915,1.965,-1.575,-1.215,-0.375,0.865,-1.575,-1.205,-0.355,0.825,-1.575,-1.205,-0.345,0.785,-1.565,-1.195,-0.325,0.745,-1.565,-1.185,-0.305,0.705,-1.565,-1.185,-0.285,0.665,-1.565,-1.175,-0.265,0.625,-1.565,-1.165,-0.245,0.585,-1.565,-1.165,-0.225,0.545,-1.565,-1.155,-0.195,0.495,-1.555,-1.145,-0.165,0.455,-1.555,-1.145,-0.135,0.405,-1.555,-1.135,-0.0849999999999999,0.345,-1.555,-1.125,-0.035,0.275,-1.545,-1.115,0.0850000000000001,0.145,-1.545,-1.115,-1.545,-1.105,-1.545,-1.095,-1.535,-1.085,-1.535,-1.085,-1.535,-1.075,-1.525,-1.065,-1.525,-1.055,-1.525,-1.045,-1.515,-1.045,-1.515,-1.035,-1.505,-1.025,-1.505,-1.015,-1.495,-1.005,-1.495)

example_df <- tibble(x = x, y = y)

ggplot(example_df, aes(x = x, y = y)) + 
  geom_point()

enter image description here

The default behavior of geom_line is to connect coordinates according to the order in which they appear in the dataframe. Is there an easy way to connect points according to Euclidean distance between points?

Upvotes: 0

Views: 1335

Answers (3)

camille
camille

Reputation: 16871

This differs from Andrew Gustar's cut-based answer just in how to separate the 3 paths. I wanted it to be a little more of a scalable process, so I tried using hierarchical clustering to put the points into 3 clusters based on their distances between one another. In this case, they were easily separable; with other data, it might be more tricky and you might need different clustering algorithms. Then based on the other answer (+1 to them), arrange each cluster by y-value to get paths to draw in the right order.

library(dplyr)
library(ggplot2)

example_df <- tibble(x = x, y = y)
clust <- hclust(dist(example_df), method = "single")

df_clustered <- example_df %>%
  mutate(cluster = as.factor(cutree(clust, k = 3))) %>%
  arrange(cluster, y)

ggplot(df_clustered, aes(x = x, y = y, color = cluster)) +
  geom_point() +
  geom_path()

Upvotes: 1

Andrew Gustar
Andrew Gustar

Reputation: 18435

Another answer - that will work for this data, but not in general

example_df$group <- cut(example_df$y, 
                        breaks = c(Inf, -0.8, -1.4, -Inf))     #breaks determined 'by eye'
example_df <- example_df[order(example_df$y), ]                #sort by y
ggplot(example_df, aes(x = x, y = y, group = group)) + 
  geom_point() +
  geom_path(colour = "blue")

enter image description here

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18435

Here is a solution to the question you asked, although I suspect it is not quite what you actually wanted, but it might help...

distmat <- as.matrix(dist(example_df))    #matrix of Euclidean distances between rows
diag(distmat) <- Inf                      #remove zeros on diagonal
nearest <- apply(distmat, 1, which.min)   #find index of nearest point to each point
example_df$xend <- example_df$x[nearest]  #set end point of segment from each point
example_df$yend <- example_df$y[nearest]

ggplot(example_df, aes(x = x, y = y, xend = xend, yend = yend)) + 
  geom_point() +
  geom_segment(colour = "blue")

enter image description here

Upvotes: 3

Related Questions