Reputation: 39
Hello I'm new to R and I dont understand why my following approach does not work. I have this df1 that looks somethig like this:
view duration_hours date
1 a 5 2021-03-29
2 a 7 2021-03-29
3 a 3 2021-03-30
4 b 2 2021-03-30
5 b 5 2021-03-30
6 c 9 2021-03-30
7 c 2 2021-03-31
8 c 3 2021-04-01
I want a new data frame (df2) that sums the durations - over all views and split to the single views for a certain date
date duration duration_sum a b c
1 2021-03-29 12 12 0 0
2 2021-03-30 19 3 7 9
3 2021-03-31 2 0 0 2
4 2021-04-01 3 0 0 3
First, I tried the following just for the "overall" duration, worked as intented creating the "duration_sum" variable with the summed durations for every date
df2 <- df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE)
Then I tried to add the other variables by augmenting the code in the following way
df2<- df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE),
a =sum(duration_hours[view=="a"], na.r = TRUE),
b =sum(duration_hours[view=="b"], na.r = TRUE),
c =sum(duration_hours[view=="c"], na.r = TRUE))
But that did not yield the account to the right amounts. What do I do wrong?
Upvotes: 1
Views: 33
Reputation: 887048
The argument is na.rm
and not na.r
. When we have an argument that is not matching, the TRUE
is coerced to 1 (FALSE
to 0 - thus adding 1 to the total)
e.g.
sum(c(1, 2), na.r = TRUE)
#[1] 4
sum(c(1, 2), na.rm = TRUE)
#[1] 3
The OP's corrected code would be
library(dplyr)
df1 %>%
group_by(date) %>%
summarise(duration_sum = sum(duration_hours, na.rm = TRUE),
a =sum(duration_hours[view=="a"], na.rm = TRUE),
b =sum(duration_hours[view=="b"], na.rm = TRUE),
c =sum(duration_hours[view=="c"], na.rm = TRUE))
# A tibble: 4 x 5
# date duration_sum a b c
#* <chr> <int> <int> <int> <int>
#1 2021-03-29 12 12 0 0
#2 2021-03-30 19 3 7 9
#3 2021-03-31 2 0 0 2
#4 2021-04-01 3 0 0 3
Or another option is pivot_wider
library(tidyr)
pivot_wider(df1, names_from = view, values_from = duration_hours,
values_fn = sum, values_fill = 0)
df1 <- structure(list(view = c("a", "a", "a", "b", "b", "c", "c", "c"
), duration_hours = c(5L, 7L, 3L, 2L, 5L, 9L, 2L, 3L), date = c("2021-03-29",
"2021-03-29", "2021-03-30", "2021-03-30", "2021-03-30", "2021-03-30",
"2021-03-31", "2021-04-01")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
Upvotes: 2