Simplify R code for computing group statistics across multiple factors

Question

I am trying to compute the variance for each group in a data set with multiple factors. For example, the data set below is the first 6 lines of a data frame with 5 columns: 4 factors of two levels each (No and Yes) and 1 continuous variable:

Factor A	Factor B	Factor C	Factor D	VarX
Yes	Yes	Yes	No	66.8
No	Yes	Yes	No	66.0
Yes	No	No	No	58.4
No	Yes	Yes	Yes	68.3
Yes	Yes	Yes	No	61.8
Yes	No	No	No	67.3

What I want to do is produce a summary table such as the one below:

Factor	SD (NO)	SD (YES)	SD Ratio
Factor A	3.79	3.51	1.08
Factor B	3.44	3.83	1.11
Factor C	3.77	3.53	1.07
Factor D	3.92	3.32	1.18

For each factor, I have calculated the standard deviation at each level ("No" and "Yes") as well as the ratio of the two standard deviations.

Here is the code I am using to do this:

#
# Define modify function for SD ratio column
#
sd_ratio<-function(x,y){
  return(max(x,y)/min(x,y))
}
#
# Set up storage
#
nc<-4 # number of factors in data
testDataSum<-tibble(SD_No=rep(NA,nc),
                   SD_Yes=rep(NA,nc),
                   SD_Ratio=rep(NA,nc))
#
Factor<-vector("list",4)
SDList<-vector("list",4)
#
# For Loop. Group data by factors 1,2,3,4
#
for (i in 1:4){
  Factor[[i]]<-names(testData[,i])
  SDList[[i]]<-testData %>% 
  group_by(testData[,i])%>%
  summarize(SD=sd(VarX))
}
# Load summary DF with data by unlisting SDList
#
testDataSum$SD_No<-as.vector(matrix(unlist(SDList),ncol=4,byrow=T)[,3])
testDataSum$SD_Yes<-as.vector(matrix(unlist(SDList),ncol=4,byrow=T)[,4])
testDataSum$SD_Ratio=modify2(testDataSum$SD_No,testDataSum$SD_Yes,sd_ratio)
#
# Load formatted factor names and put it at the front
#
testDataSum<-testDataSum %>% 
  mutate(Factor=unlist(Factor)) %>%
  relocate(Factor)
# Show results
testDataSum

My request is for help in simplifying this code. This works but it seems horribly ugly and complex, not to mention difficult to come back to at a later date and modify. I believe there is a much simpler way to do it without a for-loop, and without the ungainly process of unlisting SDList using the "as.vector (matrix (..." lines. I have reviewed the documentation for DPLYR and PLYR, especially the grouping section, but I am baffled. Any suggestions are much appreciated.

Here is a link to a github repository with the code and a csv file with 192 rows that you can use to produce the result table.

Git Hub Link for code and Data

Simplify R code for computing group statistics across multiple factors

Answers (1)

Related Questions