Reputation: 73
I have counts of species and different samples, and the species belong to a specific group.
What I would like to show is a plot, where the different species are colored by their group. So each species has it's own color, but in different shades according to the group they belong to. Eg. all bacteria in shades of grey, all trees in shades of green, all algae in shades of blue.
Ideally also the legend would be grouped by these groups.
This is how my data is structured:
library("ggplot2")
library("tidyverse")
mydata <- data.frame(counts=c(560, 310, 250, 243, 124, 306, 1271, 112, 201, 305, 201, 304, 136, 211, 131 ),
species=c("bact1", "bact1", "shrub1", "shrub1", "tree1", "tree1", "tree2", "algae1", "algae1", "bact2", "bact3", "tree3", "algae2", "shrub2", "shrub2"),
sample=c(1,2,1,1,2,2,1,1,2,2,2,2,1,1,2),
group=c("bacterium", "bacterium", "shrub", "shrub", "tree", "tree", "tree", "algae", "algae", "bacterium", "bacterium", "tree", "algae", "shrub", "shrub"))
> mydata
# A tibble: 15 x 4
counts species sample group
<dbl> <chr> <dbl> <fct>
1 560 bact1 1 bacterium
2 310 bact1 2 bacterium
3 250 a-tree1 1 tree
4 243 c-tree2 1 tree
5 124 c-tree1 2 tree
6 306 a-tree1 2 tree
7 1271 tree2 1 tree
8 112 algae1 1 algae
9 201 algae1 2 algae
10 305 bact2 2 bacterium
11 201 bact3 2 bacterium
12 304 tree3 2 tree
13 136 algae2 1 algae
14 211 tree2 1 tree
15 131 tree2 2 tree
I already got nice looking stacked bar-charts, however the colors are given just in the order of the labels, which is alphabetically (To show this I named some of them a- and c-tree).
This is the basic plot:
myplot <- ggplot(mydata, aes(x=sample, y=counts, fill=species))+
geom_bar(stat="identity", position = "fill") +
labs(x = "Samples", y = "Percentage of reads", fill = "Classification") +
theme(legend.position="bottom")
plot
So I tried many different things of what I found on stack overflow, but couldn't get it to work. The most promising way, I thought, is to put in a factor like:
mydata$group <- as.factor(mydata$group)
and then adding to the plot a subgroup:
myplot <- ggplot(mydata, aes(x=sample, y=counts, fill=species, subgroup=group))+
geom_bar(stat="identity", position = "fill") +
labs(x = "Samples", y = "Percentage of reads", fill = "Classification") +
theme(legend.position="bottom")
But this doesn't change a thing. And this would be only the first step to then give the groups a color.
One of the problems, why most of the answers given on stack overflow is that I have a lot of data, and there is not each species in every sample present. So I can't just give every single species a color. Although I was already so desperate that I tried that too but it wouldn't work for some reason, and anyway it would not be a neat way to do it...
Thank you very much for any help!
Upvotes: 4
Views: 841
Reputation: 643
Here is an approach using hcl.colors that can also handle factors not in alphabetical order. Further, I use forcats::fct_relevel, so that the species are printed in the order of color shades not a-z, see Factors with forcats Cheat Sheet
set.seed(1)
library("ggplot2")
library("tidyverse")
mydata <- data_frame(counts=c(560, 310, 250, 243, 124, 306, 1271, 112, 201, 305, 201, 304, 136, 211, 131 ),
species=c("zbact1", "bact1", "shrub1", "shrub1", "tree1", "tree1", "tree2", "algae1", "algae1", "bact2", "bact3", "tree3", "algae2", "shrub2", "shrub2"),
sample=c(1,2,1,1,2,2,1,1,2,2,2,2,1,1,2),
group=c("bacterium", "bacterium", "shrub", "shrub", "tree", "tree", "tree", "algae", "algae", "bacterium", "bacterium", "tree", "algae", "shrub", "shrub"))
#> Warning: `data_frame()` is deprecated, use `tibble()`.
#> This warning is displayed once per session.
mydata$species <- as.factor(mydata$species)
mydata$group <- as.factor(mydata$group)
make_pal <- function(group, sub){
stopifnot(
is.factor(group),
is.factor(sub)
)
# all the monochromatic pals in RColorBrewer
mono_pals <- c("Blues", "Greens", "Oranges", "Purples", "Reds", "Grays")
# how many sub levels per group level
data <- tibble(group = group, sub = sub) %>%
distinct()
d_count <- data %>%
count(group)
names_vec <- data %>%
arrange(group) %>%
magrittr::extract("sub") %>%
unlist
# make a named vector to be used with scale_fill_manual
l <- list(
n = d_count[["n"]],
name = mono_pals[1:length(levels(group))]
)
map2(l$n,
l$name,
hcl.colors) %>%
flatten_chr() %>%
set_names(names_vec)
}
custom_pal <- make_pal(mydata$group, mydata$species)
mydata$species <- fct_relevel(mydata$species, names(custom_pal))
myplot <- mydata %>%
ggplot(aes(x=sample, y=counts, fill=species))+
geom_bar(stat="identity", position = "fill") +
labs(x = "Samples", y = "Percentage of reads", fill = "Classification") +
scale_fill_manual(values = custom_pal)+
theme(legend.position="bottom")
myplot
Created on 2019-07-24 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 144
Not the nicest way but I used a workaround inspired by this post. I mapped the fill
aesthethic to the group
and alpha
to species
. Then I needed to add a white background to make it look nicer. Finally changed the range of alpha to restrict the minimum value of it.
Here is the result.
myplot <- ggplot(mydata, aes(x=sample, y=counts, fill=group, alpha=species))+
geom_bar(fill = "white",alpha = 1, stat="identity", position = "fill") +
geom_bar(stat="identity", position = "fill") +
labs(x = "Samples", y = "Percentage of reads", fill = "Classification") +
theme(legend.position="bottom") +
scale_alpha_discrete(range = c(0.3, 1))
myplot
And the result looks like this:
Upvotes: 0