Reputation: 85
I have a data.table that looks like the following:
ID Date Team MonthFactor
1 2512 2015-04-24 Purple 2015-04
2 2512 2015-04-25 Purple 2015-04
3 2512 2015-04-26 Purple 2015-04
4 2512 2015-04-27 Purple 2015-04
I would like to get the number of rows grouped by both Team
and MonthFactor
, including when there are no rows from a given month, IE if purple team had no entries in the month of May but yellow did, the summarized table would look like:
Team MonthFactor N
1 Purple 2015-04 10
2 Purple 2015-05 0
3 Yellow 2015-04 5
4 Yellow 2015-05 7
Doing this would be trivial if I didn't need the "empty" groups, but I can't wrap my head around how to specify the groups that need to be evaluated when there might not be rows that contain a given monthFactor.
Upvotes: 0
Views: 375
Reputation: 83255
You can achieve that by using a cross-join:
dat[, .N, .(Team, MonthFactor)
][CJ(Team, MonthFactor, unique = TRUE), on = c(Team = "V1", MonthFactor = "V2")
][is.na(N), N := 0][]
this gives:
Team MonthFactor N
1: Purple 2015-04 2
2: Purple 2015-05 0
3: Yellow 2015-04 5
4: Yellow 2015-05 3
The advantage of this method is that it is easier to include other variables as well. Supposing that ID
is just a numeric value, consider this example:
dat[, .(.N, sID = sum(ID)), .(Team, MonthFactor)
][CJ(Team, MonthFactor, unique = TRUE), on = c(Team = "V1", MonthFactor = "V2")
][is.na(N), `:=` (N = 0, sID = 0)][]
which gives:
Team MonthFactor N sID
1: Purple 2015-04 2 5024
2: Purple 2015-05 0 0
3: Yellow 2015-04 5 12560
4: Yellow 2015-05 3 7536
Used data:
dat <- structure(list(ID = c(2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L),
Date = structure(c(1L, 2L, 1L, 2L, 3L, 4L, 4L, 2L, 3L, 4L), .Label = c("2015-04-24", "2015-04-25", "2015-04-26", "2015-04-27"), class = "factor"),
Team = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Purple", "Yellow"), class = "factor"),
MonthFactor = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("2015-04", "2015-05"), class = "factor")),
.Names = c("ID", "Date", "Team", "MonthFactor"), class = c("data.table", "data.frame"), row.names = c(NA, -10L))
Upvotes: 5