Reputation: 147
I was trying to use ggplot2 to creat a percentage barplot.
An example dataframe
sample mapped(%) unmapped(%) reads
sample1 96.5 3.5 1320
sample2 97.4 2.6 1451
sample3 92.1 7.9 1824
sample4 98.7 1.3 1563
and I used following code to create the barplot
df <- algin %>% gather(col,reads,mapped:reads)
ggplot(df,aes(x=sample, y=reads, fill=col)) + geom_col(position = position_stack()) + coord_flip() + scale_fill_manual("legend", values = c("mapped" = "darkred", "unmapped" = "red", "reads"="darkblue"))
Although the created barplot here is close to what I desired to display, it doesn't seem like correct, e.g. legend should be "mapped" with darkblue color, "unmapped" with darkred color.
I set above values
as I tried different settings, and only above one gave me the desired visual effect.
For example, I also tried
ggplot(df, aes(x = sample, y = reads, fill = col)) +
geom_col(position = position_stack()) +
coord_flip() +
scale_fill_manual(
"legend",
values = c("mapped" = "darkblue", "unmapped" = "darkred", "reads" = "red")
)
What I want to see is
reads
(sequencing reads) of each sample, and add every x-axis values with M unit, e.g. 500M, 1000M, etc;mapped
) to the reference genome;unmapped
) to the reference genome;An example of the desired plot as follows
Solutions appreciated!
Thanks!
Upvotes: 1
Views: 141
Reputation: 46898
Your table:
df <- structure(list(sample = structure(1:4, .Label = c("sample1",
"sample2", "sample3", "sample4"), class = "factor"), `mapped(%)` = c(96.5,
97.4, 92.1, 98.7), `unmapped(%)` = c(3.5, 2.6, 7.9, 1.3), reads = c(1320L,
1451L, 1824L, 1563L)), class = "data.frame", row.names = c(NA,
-4L))
You need to calculate the number of mapped and unmapped reads, and we make it into a long format using pivot_longer which is similar to gather() which you used. We keep only the columns we need.
library(tidyverse)
plotdf <- df %>%
mutate(mapped=`mapped(%)`*reads/100,
unmapped=`unmapped(%)`*reads/100) %>%
select(sample,mapped,unmapped) %>%
pivot_longer(-sample) %>%
mutate(name = factor(name, levels = c("unmapped","mapped")))
Then we set colors like you said, and also defined the breaks. And plot basically using something you already have:
COLS <- alpha(c("mapped" = "darkred", "unmapped" = "darkblue"),0.7)
BR <- seq(0,1750,by=250)
ggplot(plotdf,aes(x=sample,y=value,fill=name)) +
scale_y_continuous(breaks=BR,labels=paste(BR,"M",sep=""))+
geom_col() + coord_flip() + scale_fill_manual("legend", values = COLS)+
theme_light()+
theme(legend.position = "bottom")+
ylab("#Reads")+xlab("")
Upvotes: 0
Reputation: 10776
Assuming these data:
algin <- tribble(
~sample, ~mapped, ~unmapped, ~reads,
"sample1", 96.5, 3.5, 1320,
"sample2", 97.4, 2.6, 1451,
"sample3", 92.1, 7.9, 1824,
"sample4", 98.7, 1.3, 1563
)
We can create the plotting df like this:
df <- algin %>%
transmute(
sample,
mapped = reads * mapped / 100,
unmapped = reads * unmapped / 100
) %>%
gather(mapping, n, -sample)
And then plot what is pretty close to what you showed:
df %>%
ggplot(
aes(sample, n,
# Factor levels control the order of the colors
fill = factor(mapping, levels = c( "unmapped","mapped")))
) +
geom_col() +
scale_fill_manual(
# Control the shade with the colors of your example
values = c("mapped" = "#427BB0", "unmapped" = "#B0064C"),
# Control what the colors look like in the legend
# We could have directly named the new columns wit CamelCase too
labels = c("mapped" = "Mapped", "unmapped" = "Unmapped"),
# Control the order in the legend
breaks = c("mapped", "unmapped")
) +
# Flip sideways
coord_flip() +
# To not have the grey background
theme_minimal() +
theme(
# Your example didn't have horizontal lines
panel.grid.major.y = element_blank(),
# Self explanatory
legend.position = "bottom"
) +
# Add M to everything except 0
scale_y_continuous(labels = as_mapper(~ifelse(. == 0, "0",paste0(., "M")))) +
labs(
# Your example has no x axis label
x = NULL,
y = "# Reads",
# The values are self explanatory
fill = NULL
)
Upvotes: 2