Ming
Ming

Reputation: 147

creat percentage ggplot2 plot

I was trying to use ggplot2 to creat a percentage barplot.

An example dataframe

sample   mapped(%) unmapped(%) reads
sample1   96.5      3.5        1320
sample2   97.4      2.6        1451
sample3   92.1      7.9        1824
sample4   98.7      1.3        1563

and I used following code to create the barplot

df <- algin %>% gather(col,reads,mapped:reads)
ggplot(df,aes(x=sample, y=reads, fill=col)) + geom_col(position = position_stack()) + coord_flip() + scale_fill_manual("legend", values = c("mapped" = "darkred", "unmapped" = "red", "reads"="darkblue"))

enter image description here

Although the created barplot here is close to what I desired to display, it doesn't seem like correct, e.g. legend should be "mapped" with darkblue color, "unmapped" with darkred color.

I set above values as I tried different settings, and only above one gave me the desired visual effect.

For example, I also tried

ggplot(df, aes(x = sample, y = reads, fill = col)) + 
  geom_col(position = position_stack()) + 
  coord_flip() + 
  scale_fill_manual(
    "legend", 
     values = c("mapped" = "darkblue", "unmapped" = "darkred", "reads" = "red")
  )

Then the plot looks like... enter image description here

What I want to see is

  1. bar length represents reads (sequencing reads) of each sample, and add every x-axis values with M unit, e.g. 500M, 1000M, etc;
  2. darkblue color corresponds to the percentage of reads that were aligned (i.e. mapped) to the reference genome;
  3. darkred color corresponds to the percentage of reads that were not aligned (i.e. unmapped) to the reference genome;
  4. legend: mapped, unmapped, and better to remove reads (as is no necessary to be there)

An example of the desired plot as follows enter image description here

Solutions appreciated!

Thanks!

Upvotes: 1

Views: 141

Answers (2)

StupidWolf
StupidWolf

Reputation: 46898

Your table:

df <- structure(list(sample = structure(1:4, .Label = c("sample1", 
"sample2", "sample3", "sample4"), class = "factor"), `mapped(%)` = c(96.5, 
97.4, 92.1, 98.7), `unmapped(%)` = c(3.5, 2.6, 7.9, 1.3), reads = c(1320L, 
1451L, 1824L, 1563L)), class = "data.frame", row.names = c(NA, 
-4L))

You need to calculate the number of mapped and unmapped reads, and we make it into a long format using pivot_longer which is similar to gather() which you used. We keep only the columns we need.

library(tidyverse)
plotdf <- df %>% 
mutate(mapped=`mapped(%)`*reads/100,
unmapped=`unmapped(%)`*reads/100) %>%
select(sample,mapped,unmapped) %>% 
pivot_longer(-sample) %>%
mutate(name = factor(name, levels = c("unmapped","mapped")))

Then we set colors like you said, and also defined the breaks. And plot basically using something you already have:

COLS <- alpha(c("mapped" = "darkred", "unmapped" = "darkblue"),0.7)
BR <- seq(0,1750,by=250)
ggplot(plotdf,aes(x=sample,y=value,fill=name)) + 
scale_y_continuous(breaks=BR,labels=paste(BR,"M",sep=""))+
geom_col() + coord_flip() + scale_fill_manual("legend", values = COLS)+
theme_light()+
theme(legend.position = "bottom")+
ylab("#Reads")+xlab("")

enter image description here

Upvotes: 0

Robin Gertenbach
Robin Gertenbach

Reputation: 10776

Assuming these data:

algin <- tribble(
  ~sample, ~mapped, ~unmapped, ~reads,
  "sample1", 96.5, 3.5, 1320,
  "sample2", 97.4, 2.6, 1451,
  "sample3", 92.1, 7.9, 1824,
  "sample4", 98.7, 1.3, 1563
) 

We can create the plotting df like this:

df <- algin %>%
  transmute(
    sample,
    mapped = reads * mapped / 100,
    unmapped = reads * unmapped / 100
  ) %>%
  gather(mapping, n, -sample)

And then plot what is pretty close to what you showed:

df %>%
  ggplot(
    aes(sample, n, 
        # Factor levels control the order of the colors
        fill = factor(mapping, levels = c( "unmapped","mapped")))
  ) +
  geom_col() +
  scale_fill_manual(
    # Control the shade with the colors of your example
    values = c("mapped" = "#427BB0", "unmapped" = "#B0064C"),
    # Control what the colors look like in the legend
    # We could have directly named the new columns wit CamelCase too
    labels = c("mapped" = "Mapped", "unmapped" = "Unmapped"),
    # Control the order in the legend
    breaks = c("mapped", "unmapped")  
  ) +
  # Flip sideways
  coord_flip() +
  # To not have the grey background
  theme_minimal() +
  theme(
    # Your example didn't have horizontal lines
    panel.grid.major.y = element_blank(),
    # Self explanatory
    legend.position = "bottom"
  ) +
  # Add M to everything except 0
  scale_y_continuous(labels = as_mapper(~ifelse(. == 0, "0",paste0(., "M")))) +
  labs(
    # Your example has no x axis label
    x = NULL,
    y = "# Reads",
    # The values are self explanatory
    fill = NULL
  )

enter image description here

Upvotes: 2

Related Questions