Aggregating factor level counts - by factor

Question

I have been trying to make a table displaying the counts of factor levels by another factor. For this, I looked on dozens of pages, questions... trying to use functions in some packages (dplyr, reshape) to get the job done, without any success in using them correctly.

That's what I got:

# my data:
var1 <- c("red","blue","red","blue","red","red","red","red","red","red","red","red","blue","red","blue")
var2 <- c("0","1","0","0","0","0","0","0","0","0","1","0","0","0","0")
var3 <- c("2","2","1","1","1","3","1","2","1","1","3","1","1","2","1")
var4 <- c("0","1","0","0","0","0","1","0","1","1","0","1","0","1","1")
mydata <- data.frame(var1,var2,var3,var4)
head(mydata)

Attempt n+1: displays only the total counts of factors by another factor.

t(aggregate(. ~ var1, mydata, sum))

      [,1]   [,2] 
var1 "blue" "red"
var2 " 5"   "12" 
var3 " 5"   "18" 
var4 " 6"   "16"

Attempt n+2: it's the correct format but I couldn't get it to work on more than one factor.

library(dplyr)
data1 <- ddply(mydata, c("var1", "var3"), summarise,
            N    = length(var1))
library(reshape)
df1 <- cast(data1, var1 ~ var3, sum)
df1 <- t(df1)
df1

   blue red
1    3   6
2    1   3
3    0   2

What I would like is:

        blue red
var2.0    3  10
var2.1    1   1
var3.1    3   6
var3.2    1   3
var3.3    0   2
var4.0    2   6
var4.1    2   5

How can I get this format? So many thanks in advance,

akrun · Accepted Answer

We can melt the dataset by 'var1' and then use table

library(reshape2)
tbl <- table(transform(melt(mydata, id.var="var1"),
        varN = paste(variable, value, sep="."))[c(4,1)])
names(dimnames(tbl)) <- NULL
tbl 
#
#         blue red
#  var2.0    3  10
#  var2.1    1   1
#  var3.1    3   6
#  var3.2    1   3
#  var3.3    0   2
#  var4.0    2   6
#  var4.1    2   5

Or using dplyr/tidyr, we convert the dataset from 'wide' to 'long' format with gather, then unite the columns ('var', 'val') to create 'varV', get the frequency (tally) after grouping by 'var1' and 'varV', and then spread to 'wide' format.

library(dplyr)
library(tidyr)
gather(mydata, var, val, -var1) %>% 
           unite(varV,var, val, sep=".") %>%
           group_by(var1, varV) %>% 
           tally() %>% 
           spread(var1, n, fill = 0)
#    varV  blue   red
#     
#1 var2.0     3    10
#2 var2.1     1     1
#3 var3.1     3     6
#4 var3.2     1     3
#5 var3.3     0     2
#6 var4.0     2     6
#7 var4.1     2     5

Aggregating factor level counts - by factor

Answers (1)

Related Questions