watchtower
watchtower

Reputation: 4298

Add descriptive statistics to a grouped dataset

I have a table that has descriptive statistics created using stat.desc() from pastecs package. However, the challenge is that I had to combine these into a list form and then, I am unable to unlist it. I found R list to data frame thread, but I have to create a temporary data.frame to make this work. The actual data that I am dealing with is big, and doesn't really permit creating a temporary data frame.

Here's my code: [You will need pastecs package. It's already loaded on my system.]

dput(df)
structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
4L), .Label = c("A", "B", "C", "D"), class = "factor"), dt = c(60, 
60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 71, 67, NA, 68, 56, 
NA, 60, 61, 63, 64, 63, 59)), .Names = c("group", "dt"), row.names = c(NA, 
-24L), class = "data.frame")

#Convert to data.table
data.table::setDT(df)
df1<-df[,.(newvar = list(stat.desc(dt))),by=group]

b<-data.frame(matrix(unlist(df1$newvar,use.names = TRUE), nrow=nrow(df1), byrow=T),stringsAsFactors = FALSE)
names(b)<- names(df1$newvar[[1]])

df1$newvar<-NULL
df1<-cbind(df1,b)
rm(b)

Here's b is the temporary table, with which I am uncomfortable.

Expected output:

structure(list(group = structure(1:4, .Label = c("A", "B", "C", 
"D"), class = "factor"), nbr.val = c(4, 8, 6, 4), nbr.null = c(0, 
0, 0, 0), nbr.na = c(0, 0, 2, 0), min = c(59, 63, 56, 59), max = c(63, 
71, 71, 64), range = c(4, 8, 15, 5), sum = c(242, 530, 383, 249
), median = c(60, 66, 64, 63), mean = c(60.5, 66.25, 63.8333333333333, 
62.25), SE.mean = c(0.866025403784439, 0.881354477089505, 2.32975916733421, 
1.10867789130417), CI.mean.0.95 = c(2.75607934655562, 2.08407217077572, 
5.9888365969565, 3.5283078589307), var = c(3, 6.21428571428571, 
32.5666666666667, 4.91666666666667), std.dev = c(1.73205080756888, 
2.49284690951645, 5.70672118354022, 2.21735578260835), coef.var = c(0.0286289389680806, 
0.0376278778794936, 0.0894003318570269, 0.0356201732145919)), .Names = c("group", 
"nbr.val", "nbr.null", "nbr.na", "min", "max", "range", "sum", 
"median", "mean", "SE.mean", "CI.mean.0.95", "var", "std.dev", 
"coef.var"), row.names = c(NA, -4L), class = "data.frame")

Sorry, if this is too basic. I am looking for faster ways (i.e. no intermediate table, and preferably a solution that uses data.table).

Thanks for your time.

Upvotes: 0

Views: 507

Answers (1)

The output of stat.desc is a data.frame with row names. By converting that to a data.table using the keep.rownames = TRUE argument it can all be done in a single line of code (to produce a long format), and then dcast to the expected wide output:

library(data.table)
library(pastecs)
df <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
   2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
   4L), .Label = c("A", "B", "C", "D"), class = "factor"), dt = c(60, 
    60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 71, 67, NA, 68, 56, 
    NA, 60, 61, 63, 64, 63, 59)), .Names = c("group", "dt"), row.names = c(NA, 
   -24L), class = "data.frame")

#Convert to data.table
dt <- data.table(df)
# stat.desc returns data.frame with row.names - convert to DT and keep row names
dt_melt <- dt[, data.table(stat.desc(.SD), keep.rownames = TRUE), by = .(group)]
# Cast to wide format with group as ID variable and each row name as a column
out <- dcast(dt_melt, group~rn, value.var = "dt")

The output is:

   group CI.mean.0.95   SE.mean   coef.var max     mean median min nbr.na nbr.null nbr.val range  std.dev sum       var
1:     A     2.756079 0.8660254 0.02862894  63 60.50000     60  59      0        0       4     4 1.732051 242  3.000000
2:     B     2.084072 0.8813545 0.03762788  71 66.25000     66  63      0        0       8     8 2.492847 530  6.214286
3:     C     5.988837 2.3297592 0.08940033  71 63.83333     64  56      2        0       6    15 5.706721 383 32.566667
4:     D     3.528308 1.1086779 0.03562017  64 62.25000     63  59      0        0       4     5 2.217356 249  4.916667

Upvotes: 5

Related Questions