Radwa Sharaf
Radwa Sharaf

Reputation: 179

Plotting coordinates for sequence alignment

I would like to reproduce this figure from a recent publication in R, but I am unsure how.

enter image description here

The plot's idea is simple. At the top is a representation of a full length virus sequence and each line underneath it depicts a sequenced isolate.

For each sequence, there are two pieces of information:

  1. Where the sequence starts, ends, deleted, etc For instance: sequence 1, starts at position 1 and ends at position 9000 But sequence 2, starts at position 1, ends at 2000, in the middle everything is deleted and then starts again at 8000-9000

  2. Color-coded based on whether it is deleted, full-length, etc

Initially I thought I could use a bar graph, like this one I drew in Illustrator, where the x would be essentially each sequence line and the y would be the coordinates of where it maps. But I am not sure if this will allow me to designate "gaps", as seen in sequence 3 in my illustrator picture.

enter image description here

The data itself is organized as such:

Sequence name    Mapped Start   Mapped End
1                1              9000
2                4000           9000 
3                1              2000
3                7000           9000

The data set only includes the mapped start and end positions, not the deleted positions.

Would appreciate hearing your input!

Thanks

Upvotes: 1

Views: 762

Answers (1)

caw5cv
caw5cv

Reputation: 721

I might recommend using a series of geoms, one for each sequence. If you have your data organized in a certain way, it would be fairly straightforward. For example, if your data is in long format as follows:

dat <- data.frame(sequence=c(1,2,2,2), start=c(1,1,2001,8000), stop=c(9000,2000,7999,9000), type=c("mapped","mapped","deletion","mapped"))

Which looks like...

sequence start stop     type
       1     1 9000   mapped
       2     1 2000   mapped
       2  2001 7999 deletion
       2  8000 9000   mapped

You could do the following:

library(ggplot2)

g <- ggplot(data=dat, mapping=aes(ymin=0, ymax=1, xmin=start, xmax=stop, fill=type)) +
geom_rect() + facet_grid(sequence~., switch="y") +
labs(x="Position (BP)", y="Sequence / Strain", title="Mapped regions for all sequences") +
theme(axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
theme(plot.title = element_text(hjust = 0.5))

Which looks likethis

Upvotes: 2

Related Questions