nate
nate

Reputation: 1244

ddply, Shiny, and incorrect summary output

I am posting my question with a reproducible example. The problem: I cannot get ddply to report the mean systolic blood pressure across different factors, i.e. tobacco use, gender, etc. when using Shiny. I can generate the appropriate output in RStudio, but cannot get the function to report the specific values by factor level when using Shiny.

An example of the code that works in R:

 a<- runif(99, 0, 5)
 b <- rep(c("A", "B", "C"), 33)
 df2<- data.frame(numVar =a, factVar=b)
 res<- ddply(df2, .(factVar), summarize,
      mean = round(mean(numVar), 2),
      sd = round(sd(numVar), 2))

To use my Shiny application, the user first uploads a .csv file. The file is then split into numeric and factor variables. The variables are then stored in reactive dataframes named passdfnum and passdffact, respectively. When the user is on my histogram tab, they select a numeric variable (input$histVar) and a factor variable (input$histDensityVarFactor). I then create a temporary dataframe dataM in the subsetData function that consists of only the selected fields using this code:

   subsetData <- reactive({

   if(input$histDensityVarFactor!= "None"){
    dataM <- data.frame(numVar=passdfnum()[input$histVar], factVar=passdffact()[input$histDensityVarFactor])

   } 
  })

This generates a dataframe with the row.names, input$histVar, and input$histDensityVarFactor. * Note: the structure of this data frame is the same as the data frame in the example*

I create a graph of the data in ggplot2, just fine. Then I go to create a summary table of the numerical variable by factor level and everything goes to heck.

The code I am using in Shiny is this:

output$histMeans<-renderPrint({
if(input$histDensityVarFactor!= "None"){

numVar<-noquote(names(passdfnum()[input$histVar]))
 factVar<-noquote(names(passdffact()[input$histDensityVarFactor]))

  res<- ddply(dataM, .(dataM[[factVar]]), here(summarize),
   mean = round(mean(dataM[[input$histVar]]), 2),
     sd = round(sd(dataM[[numVar]]), 2))
 res }})

What results is an output that looks like this: Note: factor variables are coded as Yes/No or High/Medium/Low

  dataM[[factVar]]   mean    sd
1               No 127.55 15.31
2              Yes 127.55 15.31

An interesting note: if I use input$histDensityVarFactor instead of .(dataM[[factVar]]), such as in this:

  res<- ddply(dataM, input$histDensityVarFactor, here(summarize),
   mean = round(mean(dataM[[input$histVar]]), 2),
     sd = round(sd(dataM[[numVar]]), 2))
 res

I get this output:

  tobacco   mean    sd
1      No 127.55 15.31
2     Yes 127.55 15.31

My question is simple: How do I get the mean and sd of my numerical variable by levels of the factor? The mean and sd that do get reported are the mean and sd for the entire sample.

Any help would be much appreciated. Thank you in advance. Best, Nathan

Upvotes: 0

Views: 485

Answers (1)

nate
nate

Reputation: 1244

tmp <- aggregate(dataM[[numVar]]~dataM[[factVar]], dataM,    FUN=function(x) { c(n=noquote(sprintf("%.0f",length(x))), mean=noquote(sprintf("%.4f", mean(x))), sd=noquote(sprintf("%.4f", sd(x))), se=noquote(sprintf("%.4f", (sd(x)/length(x)))))})
  tmp<- cbind(tmp[1][1], tmp[2][,1])
  names(tmp) <- c(noquote(input$histDensityVarFactor), "N", "Mean", "SD", "SE")

Upvotes: 0

Related Questions