jake9115
jake9115

Reputation: 4084

How to plot a series of coordinates as rectangles using ggplot2 and R that that won't overlap?

I recently asked a question on SO about how to make rectangles from a series of coordinates, link is here.

The answer was perfect, and allows me to generate my rectangles really well:

# Sample data
plot.data <- data.frame(start.points=c(5, 32),
                        end.points=c(15, 51), 
                        text.label=c("Sample A", "Sample B"))
plot.data$text.position <- (plot.data$start.points + plot.data$end.points)/2

# Plot using ggplot
library(ggplot2)
p <- ggplot(plot.data)
p + geom_rect(aes(xmin=start.points, xmax=end.points, ymin=0, ymax=3), 
              fill="yellow") + 
  theme_bw() + geom_text(aes(x=text.position, y=1.5, label=text.label)) + 
  labs(x=NULL, y=NULL)

However, I realized that my data often has overlapping coordinates, and I want to be able to visualize each individual span without washing out overlapping spans. So, let's use this as an example data set: 2-3, 5-10, 7-10

The current code will give something like:

    ----    -----------------             
----|  |----|               |-------------
    ----    -----------------             

However, I want to somehow change the code so that overlapping data will be visualized on a new track:

    ----    -----------------             
----|  |----|               |-------------
    ----    -----------------             

                -------------             
----------------|           |---------
                -------------             

Sorry for the stupid ASCII art!

Does anyone have a suggestion? I wouldn't be adverse to independently generating several images and then stacking them, if that's easiest. Thanks!

Upvotes: 4

Views: 961

Answers (1)

Robert Krzyzanowski
Robert Krzyzanowski

Reputation: 9344

You could compute sequences of non-overlapping intervals by hand, and space out the rectangles accordingly. Here it is with the intervals package: (note we assume your points are ordered by start.points -- this is easy to do)

library(intervals)
plot.data <- data.frame(start.points = c(1,2,4,6,8,11), end.points = c(3,5,9,10,12,13),
                        text.label = paste0('Sample ', LETTERS[1:6]))
plot.data$text.position <- (plot.data$start.points + plot.data$end.points)/2

overlap <- interval_overlap(tmp <- Intervals(c(plot.data$start.points, plot.data$end.points)), tmp)
# Find the next non-overlapping interval
nexts <- lapply(overlap, function(x) max(x) + 1)
non_overlaps <- list()
while(sum(sapply(nexts, Negate(is.na))) > 0) {
  consec <- c()
  i <- which(sapply(nexts, Negate(is.na)))[1]

  # Find a stretch of consecutive non-overlapping intervals
  while(!is.na(i) && i <= length(nexts) && !any(sapply(non_overlaps, function(y) i %in% y))) {
    consec <- c(consec, i); i <- nexts[[i]]
  }

  non_overlaps <- append(non_overlaps, list(consec))
  # Wipe out that stretch since we're no longer looking at it
  nexts[consec] <- NA
}

# Squash remaining non-overlapping intervals -- the packing is not yet compact
i <- 1
while (i < length(non_overlaps)) {
  ints1 <- non_overlaps[[i]]
  ints1 <- Intervals(c(plot.data$start.points[ints1], plot.data$end.points[ints1]))
  j <- i + 1
  while(j <= length(non_overlaps)) {
    ints2 <-  Intervals(c(plot.data$start.points[non_overlaps[[j]]],
                  plot.data$end.points[non_overlaps[[j]]]))
    iv <- interval_overlap(ints1, ints2)
    if (length(c(iv, recursive = TRUE)) == 0) break;
    j <- j + 1
  }

  if (j <= length(non_overlaps)) {
    # we can merge non_overlaps[[i]] and non_overlaps[[j]]
    non_overlaps[[i]] <- c(non_overlaps[[i]], non_overlaps[[j]])
    non_overlaps[[j]] <- NULL
  } else {
    # we are done non_overlaps[[i]] -- nothing else can be squashed!
    i <- i + 1
  }
}

We now have

 print(non_overlaps)
 # [[1]]
 # [1] 1 3 6
 #
 # [[2]]
 # [1] 2 4 6
 #
 # [[3]]
 # [1] 5

We can graph these non-overlapping intervals on separate heights.

 ymin <- length(non_overlaps) - 1 - (sapply(seq_len(nrow(plot.data)),
    function(ix) which(sapply(non_overlaps, function(y) ix %in% y))) - 1)
 ymax <- ymin + 0.9
 text.position.y <- ymin + 0.45
 ymin <- ymin / length(non_overlaps) * 3 # rescale for display
 ymax <- ymax / length(non_overlaps) * 3 # rescale for display
 text.position.y <- text.position.y / length(non_overlaps) * 3

 library(ggplot2)
 p <- ggplot(plot.data)
 p + geom_rect(aes(xmin=start.points, xmax=end.points, ymin=ymin, ymax=ymax),
               fill="yellow") +
   theme_bw() + geom_text(aes(x=text.position, y=text.position.y, label=text.label)) +
   labs(x=NULL, y=NULL)

The final result:

enter image description here

Some more examples:

plot.data <- data.frame(start.points = c(1,3,5,7,9,11,13), end.points = c(4,6,8,10,12,14, 16), text.label = paste0('Sample ', LETTERS[1:7]))

enter image description here

plot.data <- data.frame(start.points = seq(1, 13, by = 4), end.points = seq(4, 16, by = 4), text.label = paste0('Sample ', LETTERS[1:4]))

enter image description here

set.seed(100); plot.data <- data.frame(start.points = tmp <- sort(runif(26, 1, 15)), end.points = tmp + runif(26, 1, 3), text.label = paste0('Sample ', LETTERS))

enter image description here

P.S. I apologize for the chicken scratch, but I did this rather hastily -- I am sure some of these operations can be performed more cleverly!

Upvotes: 5

Related Questions