Reputation: 459
I have a dataframe in R like this:
dat = data.frame(Sample = c(1,1,2,2,3), Start = c(100,300,150,200,160), Stop = c(180,320,190,220,170))
And I would like to plot it such that the x-axis is the position and the y-axis is the number of samples at that position, with each sample in a different colour. So in the above example you would have some positions with height 1, some with height 2 and one area with height 3. The aim being to find regions where there are a large number of samples and what samples are in that region.
i.e. something like:
&
---
********- -- **
where * = Sample 1, - = Sample 2 and & = Sample 3
Upvotes: 2
Views: 991
Reputation: 14667
This hack may be what you're looking for, however I've greatly increased the size of the dataframe in order to take advantage of stacking by geom_histogram
.
library(ggplot2)
dat = data.frame(Sample = c(1,1,2,2,3),
Start = c(100,300,150,200,160),
Stop = c(180,320,190,220,170))
# Reformat the data for plotting with geom_histogram.
dat2 = matrix(ncol=2, nrow=0, dimnames=list(NULL, c("Sample", "Position")))
for (i in seq(nrow(dat))) {
Position = seq(dat[i, "Start"], dat[i, "Stop"])
Sample = rep(dat[i, "Sample"], length(Position))
dat2 = rbind(dat2, cbind(Sample, Position))
}
dat2 = as.data.frame(dat2)
dat2$Sample = factor(dat2$Sample)
plot_1 = ggplot(dat2, aes(x=Position, fill=Sample)) +
theme_bw() +
opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank()) +
geom_hline(yintercept=seq(0, 20), colour="grey80", size=0.15) +
geom_hline(yintercept=3, linetype=2) +
geom_histogram(binwidth=1) +
ylim(c(0, 20)) +
ylab("Count") +
opts(axis.title.x=theme_text(size=11, vjust=0.5)) +
opts(axis.title.y=theme_text(size=11, angle=90)) +
opts(title="Segment Plot")
png("plot_1.png", height=200, width=650)
print(plot_1)
dev.off()
Note that the way I've reformatted the dataframe is a bit ugly, and will not scale well (e.g. if you have millions of segments and/or large start and stop positions).
Upvotes: 1
Reputation: 60924
My first try:
dat$Sample = factor(dat$Sample)
ggplot(aes(x = Start, y = Sample, xend = Stop, yend = Sample, color = Sample), data = dat) +
geom_segment(size = 2) +
geom_segment(aes(x = Start, y = 0, xend = Stop, yend = 0), size = 2, alpha = 0.2, color = "black")
I combine two segment geometries here. One draws the colored vertical bars. These show where Samples have been measured. The second geometry draws the grey bar below where the density of the samples is shown. Any comments to improve on this quick hack?
Upvotes: 2