stack_learner
stack_learner

Reputation: 263

Problem in creating boxplots with R shiny app

I'm very new to the Shiny app R. I'm trying to make simple boxplots in the Shiny R app for some dataset.

Here I am showing some example data in a file df.csv. The data looks like below. Showing the dput of the data below:

structure(list(Samples = structure(1:10, .Label = c("Sample1", 
"Sample10", "Sample2", "Sample3", "Sample4", "Sample5", "Sample6", 
"Sample7", "Sample8", "Sample9"), class = "factor"), Type = structure(c(2L, 
1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("Normal", "Tumor"
), class = "factor"), A1BG = c(0, 0.01869105, 0.026705782, 0.016576987, 
0, 0.007636787, 0.015756547, 0.00609601, 0.115575528, 0.04717536
), A1BG.AS1 = c(0, 0.096652515, 0.086710002, 0.04683499, 0.188283185, 
0.104318353, 0.102735593, 0.100064808, 0.04717536, 0.159745808
), A1CF = c(1.616942802, 1.367084444, 1.101855892, 1.3823884, 
0.631627098, 2.407159505, 1.687449785, 1.229844138, 0.87989414, 
0.642785868), A2M = c(3.357654845, 3.149165846, 3.654774122, 
2.851143092, 2.952601867, 4.002335454, 4.123949457, 3.691343955, 
3.553064673, 3.425443559), A2M.AS1 = c(0.217308191, 0.08268571, 
0.297320544, 0.101579093, 0.020102613, 0.35578965, 0.288014115, 
0.145352771, 0.043808388, 0.104677012), A2ML1 = c(0, 0.017949113, 
0.00984907, 0.002289616, 0, 0.002100359, 0.032146138, 0.052275569, 
0.537892142, 0), A2ML1.AS1 = c(0.631627098, 0.04717536, 1.229844138, 
0, 4.002335454, 0, 1.229844138, 1.229844138, 0.04717536, 0)), row.names = c(NA, 
-10L), class = "data.frame")

With the above information, I am trying to make a shiny app. My code looks like below:

library(shiny)

ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      selectInput("thegene", "Gene", choices = c("A2M", "A1CF", "A2MP1"), selected = "A2M"),
      radioButtons("colour","Colour of histogram",choices=c("red","green","blue"),selected="red"),
      width = 3
    ),
    mainPanel(
      plotOutput("boxplot"),
      width = 9
    )
  )
)

server <- function(input, output) {

  df <- read.csv("df.csv")

  library(reshape2)
  library(ggplot2)
  library(ggpubr)
  library(EnvStats)

  df.m <- melt(df, c("Samples", "Type"))

  output$boxplot <- renderPlot({
    ggplot(data=df.m, aes(x = Type, y = value, fill=variable)) +
      geom_boxplot() +
      theme_bw(base_size = 14) + xlab("") + ylab("Expression logFPKM") +
      theme(axis.text=element_text(size=15, face = "bold", color = "black"),
            axis.title=element_text(size=15, face = "bold", color = "black"),
            strip.text = element_text(size=15, face = "bold", color = "black")) +
      stat_compare_means(method = "t.test", size=5) + stat_n_text()
  })

}

# Run the application 
shinyApp(ui = ui, server = server)

So, I reshaped the information and then tried making an app to create a boxplot for each gene between Tumor (6 samples) and Normal (4 samples).

I don't see any error, but I also don't get the desired result. The output of my above code looks like below:

enter image description here

1) The number of samples in the boxplot below each Type is wrong.

2) For the Selection of genes, I could see only three genes there. I don't see other genes there. How to check for other genes?

3) The color of the histogram is also not working.

Any help is appreciated. Thank you.

Upvotes: 0

Views: 719

Answers (1)

r2evans
r2evans

Reputation: 160447

Try this.

I made a few changes, you might keep some and reverse others.

  1. I do not have ggpubr or EnvStats, so I removed some of the plotting summaries.
  2. I have static data defined, you should likely return to your read.csv solution.
  3. I added session to the server declaration, required if you want to update any inputs programmatically.
  4. I have an inefficient reactive block that just returns all of the original data; as it stands now, this is anti-idiomatic, but was added solely to demonstrate the proper use of updateSelectInput if/when the source data changes. This is necessary only if your data changes dynamically (e.g., user-uploads data or a database query), otherwise alldat() should really just be df.m (and your input should be defined statically).
  5. I updated the use of the color radio button.
library(shiny)
library(reshape2)
library(ggplot2)
library(ggpubr)
library(EnvStats)

df <- structure(list(Samples = structure(1:10, .Label = c("Sample1", 
"Sample10", "Sample2", "Sample3", "Sample4", "Sample5", "Sample6", 
"Sample7", "Sample8", "Sample9"), class = "factor"), Type = structure(c(2L, 
1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("Normal", "Tumor"
), class = "factor"), A1BG = c(0, 0.01869105, 0.026705782, 0.016576987, 
0, 0.007636787, 0.015756547, 0.00609601, 0.115575528, 0.04717536
), A1BG.AS1 = c(0, 0.096652515, 0.086710002, 0.04683499, 0.188283185, 
0.104318353, 0.102735593, 0.100064808, 0.04717536, 0.159745808
), A1CF = c(1.616942802, 1.367084444, 1.101855892, 1.3823884, 
0.631627098, 2.407159505, 1.687449785, 1.229844138, 0.87989414, 
0.642785868), A2M = c(3.357654845, 3.149165846, 3.654774122, 
2.851143092, 2.952601867, 4.002335454, 4.123949457, 3.691343955, 
3.553064673, 3.425443559), A2M.AS1 = c(0.217308191, 0.08268571, 
0.297320544, 0.101579093, 0.020102613, 0.35578965, 0.288014115, 
0.145352771, 0.043808388, 0.104677012), A2ML1 = c(0, 0.017949113, 
0.00984907, 0.002289616, 0, 0.002100359, 0.032146138, 0.052275569, 
0.537892142, 0), A2ML1.AS1 = c(0.631627098, 0.04717536, 1.229844138, 
0, 4.002335454, 0, 1.229844138, 1.229844138, 0.04717536, 0)), row.names = c(NA, 
-10L), class = "data.frame")
df.m <- reshape2::melt(df, c("Samples", "Type"))

ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
      selectInput("thegene", "Gene", choices = c("A2M", "A1CF", "A2MP1"), selected = "A2M"),
      radioButtons("colour","Colour of histogram",choices=c("red","green","blue"),selected="red"),
      width = 3
    ),
    mainPanel(
      plotOutput("boxplot"),
      width = 9
    )
  )
)

server <- function(input, output, session) {

  alldat <- reactive({
    # this is not an efficient use of a reactive block: since it does
    # not depend on any dynamic data, it will fire only once, so if
    # your data is static then this might be a touch overkill ... but
    # the premise is that your `df.m` is data that can change based on
    # updating it (e.g., DB query) or user-uploaded data (e.g., CSV
    # upload)
    choices <- unique(df.m$variable)
    selected <- isolate(input$thegene)
    if (!selected %in% choices) selected <- choices[1]
    updateSelectInput(session, "thegene", choices = choices, selected = selected)
    df.m
  })

  dat <- reactive({
    x <- alldat()
    x[ x$variable == input$thegene,,drop=FALSE]
  })

  output$boxplot <- renderPlot({
    ggplot(data = dat(), aes(x = Type, y = value, fill = variable)) +
      geom_boxplot() +
      theme_bw(base_size = 14) + xlab("") + ylab("Expression logFPKM") +
      theme(axis.text=element_text(size=15, face = "bold", color = "black"),
            axis.title=element_text(size=15, face = "bold", color = "black"),
            strip.text = element_text(size=15, face = "bold", color = "black")) +
      scale_fill_manual(values = input$colour)
  })

}

# Run the application 
shinyApp(ui = ui, server = server)

Some notes/opinions:

  • When there is dynamic data due to filtering or user-supplied modifiers, I find it nice to have a reactive block that does just the filtering/modifying, so that the modified data can be used in multiple dependent reactive blocks, ergo my dat <- reactive(...)
  • More the point, I find many not-so-good shiny apps that try to do way too much in a single reactive block; when I see a lot going on, I tend to think either (a) split the reactive block into smaller ones, especially when code is repeated in multiple blocks; and/or (b) write external functions that do most of that work, so that the shiny app itself appears more compact. Declarative function names can make readability/maintainability much easier (and can be unit-tested!).
  • I have not added any safeguards to this; one such safeguard (though this app does not show it right away) would be the use of req() to ensure that the inputs have "stabilized" during startup. With larger apps, one might notice that a few reactive blocks fire before (say) input$thegene has a valid value, which can cause some plots/tables to flicker.
  • When there is a select input that will quickly be over-written/updated, I generally go with something like choices="(initializing)" or something similar; in this case, having reasonable default choices makes sense as long as those choices are very likely or certain to be present in the real data.

Upvotes: 2

Related Questions