user1043144
user1043144

Reputation: 2710

Combining frequencies and summary statistics in one table?

I just discovered the power of plyr frequency table with several variables in R and I am still struggling to understand how it works and I hope some here can help me.

I would like to create a table (data frame) in which I can combine frequencies and summary stats but without hard-coding the values.

Here an example dataset

require(datasets)

d1 <- sleep
# I classify the variable extra to calculate the frequencies 
extraClassified <- cut(d1$extra, breaks = 3, labels = c('low', 'medium', 'high') )
d1 <- data.frame(d1, extraClassified)

The results I am looking for should look like that :

  require(plyr)

  ddply(d1, "group", summarise,  
  All = length(ID), 

  nLow    = sum(extraClassified  == "low"),
  nMedium = sum(extraClassified  == "medium"),      
  nHigh =  sum(extraClassified  == "high"),

  PctLow     = round(sum(extraClassified  == "low")/ length(ID), digits = 1),
  PctMedium  = round(sum(extraClassified  == "medium")/ length(ID), digits = 1),      
  PctHigh    = round(sum(extraClassified  == "high")/ length(ID), digits = 1),

  xmean    = round(mean(extra), digits = 1),
  xsd    =   round(sd(extra), digits = 1))

My question: how can I do this without hard-coding the values?

For the records: I tried this code, but it does not work

ddply (d1, "group", 
   function(i) c(table(i$extraClassified),     
   prop.table(as.character(i$extraClassified))),
   )

Thanks in advance

Upvotes: 1

Views: 2012

Answers (2)

user1043144
user1043144

Reputation: 2710

Thanks to Joran. I slighlty modified your function to make it more generic (without reference to the position of the variables) .

require(plyr)
            foo <- function(x,colfac,colval)
            {

              # table with frequencies
              tbl    <- table(x[,colfac])
              # table with percentages 
              tblpct <- t(prop.table(tbl))
              colnames( tblpct) <- paste(colnames(t(tbl)), 'Pct', sep = '')

              # put the first part together 
              res <- cbind(n = nrow(x), t(tbl), tblpct)
              res <- as.data.frame(res)

              # add summary statistics 

              res$mn <- mean(x[,colval])
              res$sd <- sd(x[,colval])
              res
            }

ddply(d1,.(group),foo,colfac = "extraClassified",colval = "extra")

and it works !!!

P.S : I still do not understand what (group) stands for but

Upvotes: 2

joran
joran

Reputation: 173667

Here's an example to get you started:

foo <- function(x,colfac,colval){
    tbl <- table(x[,colfac])
    res <- cbind(n = nrow(x),t(tbl),t(prop.table(tbl)))
    colnames(res)[5:7] <- paste(colnames(res)[5:7],"Pct",sep = "")
    res <- as.data.frame(res)
    res$mn <- mean(x[,colval])
    res$sd <- sd(x[,colval])
    res
}

ddply(d1,.(group),foo,colfac = "extraClassified",colval = "extra")

Don't take anything in that function foo as gospel. I just wrote that off the top of my head. Surely improvements/modifications are possible, but at least it's something to start with.

Upvotes: 2

Related Questions