Descriptive Statistics by Group for multiple variables

Question

Given the data frame (df)

Hup Hop testA   testB
Y   Hi  1   1
N   Lo  2   2
Y   Mi  3   3
N   No  4   4
Y   Hi  5   5
N   Lo  6   6
Y   Mi  7   7
N   No  8   8
Y   Hi  9   9
N   Lo  10  10
Y   Mi  11  11
N   No  12  12

I want the descriptive statistics (mean and sd) of testA and testB for the grouping variables Hup and Hop. I want to something like this.

hup testA.mean  testA.sd    testB.mean  testB.sd
y   7            3.742            7      3.742
n   6            3.742            6      3.742
hop testA.mean  testA.sd    testB.mean  testB.sd
hi  etc           Etc            etc       Etc
lo  etc           Etc            etc       Etc
mi  etc           Etc            etc       Etc

Using e.g., ddply(df,~hup,summarise,mean=round(mean(testA),3),sd=round(sd(testA),3)) would solve a part of the problem. But I want to speed up the process: learn how to use R. So, I thought:

lapply(df[ , c("testA", "testB")], function(x){ ddply(df, ~df[ , c("hup")], function(x) {mean(x)} )})

which is not working, it returns NA’s, misses the SD and reports only results for hup.

Q: How to produce descriptive statistics for several groups with multiple variables?

adibender · Accepted Answer

For display I think tabular function from tables package is easiest:

library(tables)
tabular(Hup + Hop ~ (testA + testB)*((n = 1) + mean + sd), data = df)
##       testA         testB        
##       mean  sd    n mean  sd    n
##Hup N  7     3.742 6 7     3.742 6
##    Y  6     3.742 6 6     3.742 6
##Hop Hi 5     4.000 3 5     4.000 3
##    Lo 6     4.000 3 6     4.000 3
##    Mi 7     4.000 3 7     4.000 3
##    No 8     4.000 3 8     4.000 3

U can also wrap the tabular() object in latex() to output the table in LaTeX syntax.

Descriptive Statistics by Group for multiple variables

Answers (2)

Related Questions