Mario
Mario

Reputation: 2621

data.table: Sum by all existing combinations in table

I have a data.table out like this (in reality it is much larger):

out <-      code weights group
        1:    2   0.387      1
        2:    1   0.399      1
        3:    2   1.610      1
        4:    3   1.323      2
        5:    2   0.373      2                                            
        6:    1   0.212      2
        7:    3   0.316      3
        8:    2   0.569      3
        9:    1   0.120      3
       10:    1   0.354      3

It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.

Then, I want to sum the weights for every group and code combination . I achieve this with this command:

sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]

This works well but it does not have the combination Group 1 with Code 3 because it is not in the out table. I would like to have all possible combinations in sum.dt, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1 should be 0 in this row.

Any idea how I could achieve this?

Upvotes: 5

Views: 1433

Answers (2)

Jaap
Jaap

Reputation: 83215

Using CJ (cross join) you can add the missing combinations:

library(data.table)
setkey(out, code, group)
out[CJ(code, group, unique = TRUE)
    ][, lapply(.SD, sum), by = .(code, group)
      ][is.na(weights), weights := 0]

gives:

   code group weights
1:    1     1   0.399
2:    1     2   0.212
3:    1     3   0.474
4:    2     1   1.997
5:    2     2   0.373
6:    2     3   0.569
7:    3     1   0.000
8:    3     2   1.323
9:    3     3   0.316

Or with xtabs as @alexis_laz showed in the comments:

xtabs(weights ~ group + code, out)

which gives:

     code
group     1     2     3
    1 0.399 1.997 0.000
    2 0.212 0.373 1.323
    3 0.474 0.569 0.316

If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

library(reshape2)
res <- melt(xtabs(weights ~ group + code, out))

which gives:

> class(res)
[1] "data.frame"
> res
  group code value
1     1    1 0.399
2     2    1 0.212
3     3    1 0.474
4     1    2 1.997
5     2    2 0.373
6     3    2 0.569
7     1    3 0.000
8     2    3 1.323
9     3    3 0.316

You could also do this with a combination of dplyr and tidyr:

library(dplyr)
library(tidyr)
out %>%
  complete(code, group, fill = list(weights=0)) %>%
  group_by(code, group) %>% 
  summarise(sum(weights))

Upvotes: 7

tadejsv
tadejsv

Reputation: 2092

I had a similar problem, and CJ did not work for some reason. A relatively simple solution I ended up using is first calling dcast and then melt (similar to the xtable solution above)- this also conveniently lets you specify the fill value for the missing combinations.

sum.dt <- dcast(out, code ~ group, value.var = 'weights', 
                fun.aggregate = sum, fill = 0)
sum.dt <- melt(sum.dt, id.vars = 'code', variable.name = 'group')

This gives

> sum.dt
   code group value
1:    1     1 0.399
2:    2     1 1.997
3:    3     1 0.000
4:    1     2 0.212
5:    2     2 0.373
6:    3     2 1.322
7:    1     3 0.474
8:    2     3 0.569
9:    3     3 0.316

Upvotes: 1

Related Questions