biostatguy12
biostatguy12

Reputation: 659

is there a way in R to fill missing groups absent of observations?

Say I have something like:

df<-data.frame(group=c(1, 1,1, 2,2,2,3,3,3,4,4, 1, 1,1), 
               group2=c(1,2,3,1,2,3,1,2,3,1,3, 1,2,3))
 group group2
1   1   1
2   1   2
3   1   3
4   2   1
5   2   2
6   2   3
7   3   1
8   3   2
9   3   3
10  4   1
11  4   3
12  1   1
13  1   2
14  1   3

My goal is to count the number of unique instances for group= something and group2= something. Like so:

df1<-df%>%group_by(group, group2)%>% mutate(want=n())%>%distinct(group, group2, .keep_all=TRUE)

   group group2  want
   <dbl>  <dbl> <int>
 1     1      1     2
 2     1      2     2
 3     1      3     2
 4     2      1     1
 5     2      2     1
 6     2      3     1
 7     3      1     1
 8     3      2     1
 9     3      3     1
10     4      1     1
11     4      3     1

however, notice that group=4, group2=2 was not in my dataset to begin with. Is there some sort of autofill function where I can fill these non-observations with a zero to get below easily?:

   group group2  want
   <dbl>  <dbl> <int>
 1     1      1     2
 2     1      2     2
 3     1      3     2
 4     2      1     1
 5     2      2     1
 6     2      3     1
 7     3      1     1
 8     3      2     1
 9     3      3     1
10     4      1     1
11     4      2     0
12     4      3     1

Upvotes: 2

Views: 73

Answers (2)

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6529

Here is a data.table approach solution to this problem:

library(data.table)
setDT(df)[CJ(group, group2, unique = TRUE), 
          c(.SD, .(want = .N)), .EACHI, 
          on = c("group", "group2")]

# group group2  want
#     1      1     2
#     1      2     2
#     1      3     2
#     2      1     1
#     2      2     1
#     2      3     1
#     3      1     1
#     3      2     1
#     3      3     1
#     4      1     1
#     4      2     0
#     4      3     1

Upvotes: 1

akrun
akrun

Reputation: 887901

After getting the count, we can expand with complete to fill the missing combinations with 0

library(dplyr)
library(tidyr)
df %>% 
   count(group, group2) %>% 
   complete(group, group2, fill = list(n = 0))
# A tibble: 12 x 3
#   group group2     n
#   <dbl>  <dbl> <dbl>
# 1     1      1     2
# 2     1      2     2
# 3     1      3     2
# 4     2      1     1
# 5     2      2     1
# 6     2      3     1
# 7     3      1     1
# 8     3      2     1
# 9     3      3     1
#10     4      1     1
#11     4      2     0
#12     4      3     1

Or if we do the group_by, instead of mutate and then do the distinct, directly use the summarise

df %>%
   group_by(group, group2) %>%
   summarise(n = n()) %>%
   ungroup %>%
   complete(group, group2, fill = list(n = 0))

Upvotes: 1

Related Questions