R data.table : creating a count table of values in multiple columns by using .N

Question

Here is my test DT;

a<-data.table(cluster=sample(LETTERS[1:3], size = 10, replace = T), a=sample(x=1:2, size=10, replace = T), b=sample(x=1:2, size=10, replace = T), c=sample(x=1:2, size=10, replace = T), d=sample(x=1:3, size=10, replace=T))

a
    cluster a b c d
 1:       B 1 2 1 2
 2:       C 1 1 1 1
 3:       B 2 1 1 3
 4:       A 2 2 1 1
 5:       C 2 2 1 2
 6:       A 2 2 1 3
 7:       A 2 2 1 1
 8:       A 2 1 1 2
 9:       C 2 1 1 1
10:       C 2 2 1 1

I use ply package's count to generate a count table as follows;

> a[, lapply(.SD, function(x) count(x)), .SDcols=2:5]
   a.x a.freq b.x b.freq c.x c.freq d.x d.freq
1:   1      2   1      4   1     10   1      5
2:   2      8   2      6   1     10   2      3
3:   1      2   1      4   1     10   3      2

It is pretty ugly but somewhat serves a purpose. The output that I really wish is as follows ;

    a.x a.freq b.x b.freq c.x c.freq d.x d.freq
    1:   1      2   1      4   1     10   1      5
    2:   2      8   2      6  NA     NA   2      3
    3:   NA     NA  NA    NA  NA     NA   3      2

Also, I would like to group them with cluster vectors if possible but adding by=cluster fails. Furthermore, I've tried using UniqueN and .N, which works fine with a single column but not with multiple columns. At this point, I'd really appreciate any pointers.

akrun · Accepted Answer

If we need to use .N, loop over the column names, group by that column, get the .N and cbind with cbind.fill from rowr

library(data.table)
do.call(rowr::cbind.fill, c(lapply(names(a)[-1], 
        function(nm) a[,  .N, by = nm][order(get(nm))]), fill = NA))

data

a <- structure(list(cluster = c("B", "C", "B", "A", "C", "A", "A", 
"A", "C", "C"), a = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    b = c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L), c = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), d = c(2L, 1L, 3L, 1L, 
    2L, 3L, 1L, 2L, 1L, 1L)), class = c("data.table", "data.frame"
), row.names = c(NA, -10L))

R data.table : creating a count table of values in multiple columns by using .N

Answers (2)

Data

data

Related Questions