Reputation: 39
I have a dataframe that looks similar to this one:
date uid duration
1 29.03.2020 0zOs6ZS9 1
2 29.03.2020 0zOs6ZS9 5
3 29.03.2020 0zOs6ZS9 2
4 31.03.2020 0zOs6ZS9 6
5 01.04.2020 0zOs6ZS9 7
6 01.04.2020 0zOs6ZS9 4
7 29.03.2020 0zOs6ZS9 3
8 29.03.2020 3jtMiD 2
9 30.03.2020 3jtMiD 7
10 30.03.2020 3jtMiD 5
11 31.03.2020 3jtMiD 1
12 02.04.2020 3jtMiD 2
My goal is to sum the durations over date and uid but if there is a measure for one user but not for the other for one date I actually want this row to be added with 0 duration.
So rather than the following df I achieved by
df2 <- df1 %>%
group_by (date,uid) %>%
summarise(duration =sum(duration, na.rm = TRUE))
date uid duration
<date> <chr> <dbl>
1 2020-03-29 0zOs6ZS9 11
2 2020-03-29 3jtMiD 2
3 2020-03-30 3jtMiD 12
4 2020-03-31 0zOs6ZS9 6
5 2020-03-31 3jtMiD 1
6 2020-04-01 0zOs6ZS9 11
7 2020-04-02 3jtMiD 2
I want a df that looks like this:
date1 uid1 duration1
1 2020-03-29 0zOs6ZS9 11
2 2020-03-30 0zOs6ZS9 0
3 2020-03-31 0zOs6ZS9 6
4 2020-04-01 0zOs6ZS9 11
5 2020-04-02 0zOs6ZS9 0
6 2020-03-29 3jtMiD 2
7 2020-03-30 3jtMiD 12
8 2020-03-31 3jtMiD 1
9 2020-04-01 3jtMiD 0
10 2020-04-02 3jtMiD 2
How can I do this?
Upvotes: 1
Views: 27
Reputation: 887681
We can use complete
library(tidyr)
library(dplyr)
df2 %>%
ungroup %>%
complete(date, uid, fill = list(duration = 0))
-output
# A tibble: 10 x 3
# date uid duration
# <date> <chr> <dbl>
# 1 2020-03-29 0zOs6ZS9 11
# 2 2020-03-29 3jtMiD 2
# 3 2020-03-30 0zOs6ZS9 0
# 4 2020-03-30 3jtMiD 12
# 5 2020-03-31 0zOs6ZS9 6
# 6 2020-03-31 3jtMiD 1
# 7 2020-04-01 0zOs6ZS9 11
# 8 2020-04-01 3jtMiD 0
# 9 2020-04-02 0zOs6ZS9 0
#10 2020-04-02 3jtMiD 2
df2 <- structure(list(date = structure(c(18350, 18350, 18351, 18352,
18352, 18353, 18354), class = "Date"), uid = c("0zOs6ZS9", "3jtMiD",
"3jtMiD", "0zOs6ZS9", "3jtMiD", "0zOs6ZS9", "3jtMiD"), duration = c(11L,
2L, 12L, 6L, 1L, 11L, 2L)), row.names = c("1", "2", "3", "4",
"5", "6", "7"), class = "data.frame")
Upvotes: 1