Group over two variables with aditional rows

Question

I have a dataframe that looks similar to this one:

         date      uid duration
1  29.03.2020 0zOs6ZS9        1
2  29.03.2020 0zOs6ZS9        5
3  29.03.2020 0zOs6ZS9        2
4  31.03.2020 0zOs6ZS9        6
5  01.04.2020 0zOs6ZS9        7
6  01.04.2020 0zOs6ZS9        4
7  29.03.2020 0zOs6ZS9        3
8  29.03.2020   3jtMiD        2
9  30.03.2020   3jtMiD        7
10 30.03.2020   3jtMiD        5
11 31.03.2020   3jtMiD        1
12 02.04.2020   3jtMiD        2

My goal is to sum the durations over date and uid but if there is a measure for one user but not for the other for one date I actually want this row to be added with 0 duration.

So rather than the following df I achieved by

df2 <- df1 %>%
  group_by (date,uid) %>%
  summarise(duration =sum(duration, na.rm = TRUE))

  date       uid      duration
              
1 2020-03-29 0zOs6ZS9       11
2 2020-03-29 3jtMiD          2
3 2020-03-30 3jtMiD         12
4 2020-03-31 0zOs6ZS9        6
5 2020-03-31 3jtMiD          1
6 2020-04-01 0zOs6ZS9       11
7 2020-04-02 3jtMiD          2

I want a df that looks like this:

        date1     uid1 duration1
1  2020-03-29 0zOs6ZS9        11
2  2020-03-30 0zOs6ZS9         0
3  2020-03-31 0zOs6ZS9         6
4  2020-04-01 0zOs6ZS9        11
5  2020-04-02 0zOs6ZS9         0
6  2020-03-29   3jtMiD         2
7  2020-03-30   3jtMiD        12
8  2020-03-31   3jtMiD         1
9  2020-04-01   3jtMiD         0
10 2020-04-02   3jtMiD         2

How can I do this?

akrun · Accepted Answer

We can use complete

library(tidyr)
library(dplyr)
df2 %>%
  ungroup %>%
  complete(date, uid, fill = list(duration = 0))

-output

# A tibble: 10 x 3
#   date       uid      duration
#               
# 1 2020-03-29 0zOs6ZS9       11
# 2 2020-03-29 3jtMiD          2
# 3 2020-03-30 0zOs6ZS9        0
# 4 2020-03-30 3jtMiD         12
# 5 2020-03-31 0zOs6ZS9        6
# 6 2020-03-31 3jtMiD          1
# 7 2020-04-01 0zOs6ZS9       11
# 8 2020-04-01 3jtMiD          0
# 9 2020-04-02 0zOs6ZS9        0
#10 2020-04-02 3jtMiD          2

data

df2 <- structure(list(date = structure(c(18350, 18350, 18351, 18352, 
18352, 18353, 18354), class = "Date"), uid = c("0zOs6ZS9", "3jtMiD", 
"3jtMiD", "0zOs6ZS9", "3jtMiD", "0zOs6ZS9", "3jtMiD"), duration = c(11L, 
2L, 12L, 6L, 1L, 11L, 2L)), row.names = c("1", "2", "3", "4", 
"5", "6", "7"), class = "data.frame")

Group over two variables with aditional rows

Answers (1)

data

Related Questions