Reputation: 1244
I am posting my question with a reproducible example. The problem: I cannot get ddply to report the mean systolic blood pressure across different factors, i.e. tobacco use, gender, etc. when using Shiny. I can generate the appropriate output in RStudio, but cannot get the function to report the specific values by factor level when using Shiny.
An example of the code that works in R:
a<- runif(99, 0, 5)
b <- rep(c("A", "B", "C"), 33)
df2<- data.frame(numVar =a, factVar=b)
res<- ddply(df2, .(factVar), summarize,
mean = round(mean(numVar), 2),
sd = round(sd(numVar), 2))
To use my Shiny application, the user first uploads a .csv file. The file is then split into numeric and factor variables. The variables are then stored in reactive dataframes named passdfnum and passdffact, respectively. When the user is on my histogram tab, they select a numeric variable (input$histVar) and a factor variable (input$histDensityVarFactor). I then create a temporary dataframe dataM in the subsetData function that consists of only the selected fields using this code:
subsetData <- reactive({
if(input$histDensityVarFactor!= "None"){
dataM <- data.frame(numVar=passdfnum()[input$histVar], factVar=passdffact()[input$histDensityVarFactor])
}
})
This generates a dataframe with the row.names, input$histVar, and input$histDensityVarFactor. * Note: the structure of this data frame is the same as the data frame in the example*
I create a graph of the data in ggplot2, just fine. Then I go to create a summary table of the numerical variable by factor level and everything goes to heck.
The code I am using in Shiny is this:
output$histMeans<-renderPrint({
if(input$histDensityVarFactor!= "None"){
numVar<-noquote(names(passdfnum()[input$histVar]))
factVar<-noquote(names(passdffact()[input$histDensityVarFactor]))
res<- ddply(dataM, .(dataM[[factVar]]), here(summarize),
mean = round(mean(dataM[[input$histVar]]), 2),
sd = round(sd(dataM[[numVar]]), 2))
res }})
What results is an output that looks like this: Note: factor variables are coded as Yes/No or High/Medium/Low
dataM[[factVar]] mean sd
1 No 127.55 15.31
2 Yes 127.55 15.31
An interesting note: if I use input$histDensityVarFactor
instead of .(dataM[[factVar]])
, such as in this:
res<- ddply(dataM, input$histDensityVarFactor, here(summarize),
mean = round(mean(dataM[[input$histVar]]), 2),
sd = round(sd(dataM[[numVar]]), 2))
res
I get this output:
tobacco mean sd
1 No 127.55 15.31
2 Yes 127.55 15.31
My question is simple: How do I get the mean and sd of my numerical variable by levels of the factor? The mean and sd that do get reported are the mean and sd for the entire sample.
Any help would be much appreciated. Thank you in advance. Best, Nathan
Upvotes: 0
Views: 485
Reputation: 1244
tmp <- aggregate(dataM[[numVar]]~dataM[[factVar]], dataM, FUN=function(x) { c(n=noquote(sprintf("%.0f",length(x))), mean=noquote(sprintf("%.4f", mean(x))), sd=noquote(sprintf("%.4f", sd(x))), se=noquote(sprintf("%.4f", (sd(x)/length(x)))))})
tmp<- cbind(tmp[1][1], tmp[2][,1])
names(tmp) <- c(noquote(input$histDensityVarFactor), "N", "Mean", "SD", "SE")
Upvotes: 0