Reputation: 4084
I recently asked a question on SO about how to make rectangles from a series of coordinates, link is here.
The answer was perfect, and allows me to generate my rectangles really well:
# Sample data
plot.data <- data.frame(start.points=c(5, 32),
end.points=c(15, 51),
text.label=c("Sample A", "Sample B"))
plot.data$text.position <- (plot.data$start.points + plot.data$end.points)/2
# Plot using ggplot
library(ggplot2)
p <- ggplot(plot.data)
p + geom_rect(aes(xmin=start.points, xmax=end.points, ymin=0, ymax=3),
fill="yellow") +
theme_bw() + geom_text(aes(x=text.position, y=1.5, label=text.label)) +
labs(x=NULL, y=NULL)
However, I realized that my data often has overlapping coordinates, and I want to be able to visualize each individual span without washing out overlapping spans. So, let's use this as an example data set: 2-3, 5-10, 7-10
The current code will give something like:
---- -----------------
----| |----| |-------------
---- -----------------
However, I want to somehow change the code so that overlapping data will be visualized on a new track:
---- -----------------
----| |----| |-------------
---- -----------------
-------------
----------------| |---------
-------------
Sorry for the stupid ASCII art!
Does anyone have a suggestion? I wouldn't be adverse to independently generating several images and then stacking them, if that's easiest. Thanks!
Upvotes: 4
Views: 961
Reputation: 9344
You could compute sequences of non-overlapping intervals by hand, and space out the rectangles accordingly. Here it is with the intervals
package: (note we assume your points are ordered by start.points
-- this is easy to do)
library(intervals)
plot.data <- data.frame(start.points = c(1,2,4,6,8,11), end.points = c(3,5,9,10,12,13),
text.label = paste0('Sample ', LETTERS[1:6]))
plot.data$text.position <- (plot.data$start.points + plot.data$end.points)/2
overlap <- interval_overlap(tmp <- Intervals(c(plot.data$start.points, plot.data$end.points)), tmp)
# Find the next non-overlapping interval
nexts <- lapply(overlap, function(x) max(x) + 1)
non_overlaps <- list()
while(sum(sapply(nexts, Negate(is.na))) > 0) {
consec <- c()
i <- which(sapply(nexts, Negate(is.na)))[1]
# Find a stretch of consecutive non-overlapping intervals
while(!is.na(i) && i <= length(nexts) && !any(sapply(non_overlaps, function(y) i %in% y))) {
consec <- c(consec, i); i <- nexts[[i]]
}
non_overlaps <- append(non_overlaps, list(consec))
# Wipe out that stretch since we're no longer looking at it
nexts[consec] <- NA
}
# Squash remaining non-overlapping intervals -- the packing is not yet compact
i <- 1
while (i < length(non_overlaps)) {
ints1 <- non_overlaps[[i]]
ints1 <- Intervals(c(plot.data$start.points[ints1], plot.data$end.points[ints1]))
j <- i + 1
while(j <= length(non_overlaps)) {
ints2 <- Intervals(c(plot.data$start.points[non_overlaps[[j]]],
plot.data$end.points[non_overlaps[[j]]]))
iv <- interval_overlap(ints1, ints2)
if (length(c(iv, recursive = TRUE)) == 0) break;
j <- j + 1
}
if (j <= length(non_overlaps)) {
# we can merge non_overlaps[[i]] and non_overlaps[[j]]
non_overlaps[[i]] <- c(non_overlaps[[i]], non_overlaps[[j]])
non_overlaps[[j]] <- NULL
} else {
# we are done non_overlaps[[i]] -- nothing else can be squashed!
i <- i + 1
}
}
We now have
print(non_overlaps)
# [[1]]
# [1] 1 3 6
#
# [[2]]
# [1] 2 4 6
#
# [[3]]
# [1] 5
We can graph these non-overlapping intervals on separate heights.
ymin <- length(non_overlaps) - 1 - (sapply(seq_len(nrow(plot.data)),
function(ix) which(sapply(non_overlaps, function(y) ix %in% y))) - 1)
ymax <- ymin + 0.9
text.position.y <- ymin + 0.45
ymin <- ymin / length(non_overlaps) * 3 # rescale for display
ymax <- ymax / length(non_overlaps) * 3 # rescale for display
text.position.y <- text.position.y / length(non_overlaps) * 3
library(ggplot2)
p <- ggplot(plot.data)
p + geom_rect(aes(xmin=start.points, xmax=end.points, ymin=ymin, ymax=ymax),
fill="yellow") +
theme_bw() + geom_text(aes(x=text.position, y=text.position.y, label=text.label)) +
labs(x=NULL, y=NULL)
The final result:
Some more examples:
plot.data <- data.frame(start.points = c(1,3,5,7,9,11,13), end.points = c(4,6,8,10,12,14, 16), text.label = paste0('Sample ', LETTERS[1:7]))
plot.data <- data.frame(start.points = seq(1, 13, by = 4), end.points = seq(4, 16, by = 4), text.label = paste0('Sample ', LETTERS[1:4]))
set.seed(100); plot.data <- data.frame(start.points = tmp <- sort(runif(26, 1, 15)), end.points = tmp + runif(26, 1, 3), text.label = paste0('Sample ', LETTERS))
P.S. I apologize for the chicken scratch, but I did this rather hastily -- I am sure some of these operations can be performed more cleverly!
Upvotes: 5