seakyourpeak
seakyourpeak

Reputation: 551

group by count when count is zero in r

I use aggregate function to get count by group. The aggregate function only returns count for groups if count > 0. This is what I have

dt <- data.frame(
n  = c(1,2,3,4,5,6),
id = c('A','A','A','B','B','B'),
group = c("x","x","y","x","x","x")) 

applying the aggregate function

my.count <- aggregate(n ~ id+group, dt, length)

now see the results

my.count[order(my.count$id),]

I get following

id group   n
1  A     x 2
3  A     y 1
2  B     x 3

I need the following (the last row has zero that i need)

id group   n
1  A     x 2
3  A     y 1
2  B     x 3
4  B     y 0

thanks for you help in in advance

Upvotes: 4

Views: 1652

Answers (4)

stok
stok

Reputation: 375

aggregate with drop=FALSE worked for me.

my.count <- aggregate(n ~ id+group, dt, length, drop=FALSE)
my.count[is.na(my.count)] <- 0
my.count
#   id group n
# 1  A     x 2
# 2  B     x 3
# 3  A     y 1
# 4  B     y 0

Upvotes: 2

MarkusN
MarkusN

Reputation: 3223

If you are interested in frequencies only, you create with your formula a frequency table an turn it into a dataframe:

as.data.frame(xtabs(formula = ~ id + group, dt))

Obviously this won't work for other aggregate functions. I'm still waiting for dplyr's summarise function to let the user decide whether zero-groups are kept or not. Maybe you can vote for this improvement here: https://github.com/hadley/dplyr/issues/341

Upvotes: 0

akrun
akrun

Reputation: 887541

We can create another column 'ind' and then use dcast to reshape from 'long' to 'wide', specifying the fun.aggregate as length and drop=FALSE.

library(reshape2)
dcast(transform(dt, ind='n'), id+group~ind,
           value.var='n', length, drop=FALSE)
#  id group n
#1  A     x 2
#2  A     y 1
#3  B     x 3
#4  B     y 0

Or a base R option is

 as.data.frame(table(dt[-1]))

Upvotes: 6

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193647

You can merge your "my.count" object with the complete set of "id" and "group" columns:

merge(my.count, expand.grid(lapply(dt[c("id", "group")], unique)), all = TRUE)
##   id group  n
## 1  A     x  2
## 2  A     y  1
## 3  B     x  3
## 4  B     y NA

There are several questions on SO that show you how to replace NA with 0 if that is required.

Upvotes: 3

Related Questions