Kennedy Leverett
Kennedy Leverett

Reputation: 13

Grouped box plot

Here is what it looks like after those edits - lines but no boxes. new image

Reproducible code:

df <- data.frame(SampleID = structure(c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), 
                                      .Label = c("C004", "C005", "C007", "C009", "C010", 
                                                 "C011", "C013", "C027", "C028", "C029", 
                                                 "C030", "C031", "C032", "C033", "C034", 
                                                 "C035", "C036", "C042", "C043", "C044", 
                                                 "C045", "C046", "C047", "C048", "C049", 
                                                 "C058", "C086"), class = "factor"), 
                 Sequencing.Depth = c(1L, 2612L, 5223L, 7834L, 10445L, 13056L, 15667L, 18278L, 
                                      20889L, 23500L), 
                 Observed.OTUs = c(1, 213, 289.5, 338, 377.8, 408.9, 434.4, 453.8, 472.1, NA), 
                 Mange = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
                                   .Label = c("N", "Y"), class = "factor"), 
                 SpeciesCode = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 
                                         .Label = c("Cla", "Ucin", "Vvu"), class = "factor"))

Upvotes: 1

Views: 118

Answers (1)

dc37
dc37

Reputation: 16178

In your aes, you can use interaction of your x values and your categorical values for plotting boxplot on a continuous x axis and pass position = "identity" in order to place them on the precise x values and not to be dodged.

Here to add the line connecting each boxplot, I calculate mean per Species per x values using dplyr directly inggplot but you can calculate outside and generate a second dataframe.

So, as your x values are pretty spread from 1 to 23500, you will have to modify the width of the geom_boxplot in order to see a box and not a single line:

library(ggplot2)
library(dplyr)

ggplot(df,aes(x = Xvalues, y = Yvalues, color = Species, 
              group = interaction(Species, Xvalues)))+
  geom_boxplot(position = "identity", width = 1000)+
  geom_line(data = df %>% 
              group_by(Xvalues, Species) %>% 
              summarise(Mean = mean(Yvalues)),
            aes(x = Xvalues, y = Mean, 
                color = Species, group = Species))

enter image description here

So, apply to your dataset (based on informations you provided in your code), you should try something like:

library(ggplot2)
library(dplyr)

ggplot(observedotusrare, 
       aes(x=Sequencing.Depth, y=Observed.OTUs, 
                             color=SpeciesCode,
           group = interaction(Sequencing.Depth, SpeciesCode))) + 
  geom_boxplot(position = "identity", width = 1000) + 
  geom_line(data = observedotusrare %>% 
              group_by(Sequencing.Depth, SpeciesCode) %>%
              summarise(Mean = mean(Observed.OTUs, na.rm = TRUE)),
            aes(x = Sequencing.Depth, y = Mean, 
                color = SpeciesCode, group = SpeciesCode))

Does it answer your question ?


Reproducible example

df <- data.frame(Xvalues = rep(c(10,2000,23500), each = 30),
                 Species = rep(rep(LETTERS[1:3], each = 10),3),
                 Yvalues = c(rnorm(10,1,1),
                             rnorm(10,5,1),
                             rnorm(10,8,1),
                             rnorm(10,5,1),
                             rnorm(10,8,1),
                             rnorm(10,12,1),
                             rnorm(10,20,1),
                             rnorm(10,30,1),
                             rnorm(10,50,1)))

Upvotes: 2

Related Questions