Reputation: 163
library(lubridate)
library(tidyverse)
step_count_raw <- read_csv("data/step-count/step-count.csv",
locale = locale(tz = "Australia/Melbourne"))
location <- read_csv("data/step-count/location.csv")
step_count <- step_count_raw %>%
rename_with(~ c("date_time", "date", "count")) %>%
left_join(location) %>%
mutate(location = replace_na(location, "Melbourne"))
step_count
#> # A tibble: 5,448 x 4
#> date_time date count location
#> <dttm> <date> <dbl> <chr>
#> 1 2019-01-01 09:00:00 2019-01-01 764 Melbourne
#> 2 2019-01-01 10:00:00 2019-01-01 913 Melbourne
#> 3 2019-01-02 00:00:00 2019-01-02 9 Melbourne
#> 4 2019-01-02 10:00:00 2019-01-02 2910 Melbourne
#> 5 2019-01-02 11:00:00 2019-01-02 1390 Melbourne
#> 6 2019-01-02 12:00:00 2019-01-02 1020 Melbourne
#> 7 2019-01-02 13:00:00 2019-01-02 472 Melbourne
#> 8 2019-01-02 15:00:00 2019-01-02 1220 Melbourne
#> 9 2019-01-02 16:00:00 2019-01-02 1670 Melbourne
#> 10 2019-01-02 17:00:00 2019-01-02 1390 Melbourne
#> # … with 5,438 more rows
I want to calculate average daily step counts for every location, from step_count. Then end up with a tibble called city_avg_steps.
expected output
#> # A tibble: 4 x 2
#> location avg_count
#> <chr> <dbl>
#> 1 Austin 7738.
#> 2 Denver 12738.
#> 3 Melbourne 7912.
#> 4 San Francisco 13990.
My code and output
city_avg_steps <- step_count%>%group_by(location)%>%summarise(avg_count=mean(count))
city_avg_steps
# A tibble: 4 x 2
location avg_count
<chr> <dbl>
1 Austin 721.
2 Denver 650.
3 Melbourne 530.
4 San Francisco 654.
I have a clue is to calculate daily number first then cumulate the result using two summarise fuction,but not sure how to add.
Upvotes: 0
Views: 102
Reputation: 2716
As @dash2 explains in the comments, from what we understand from your desired output, it requires a two stage aggregation, one to aggregate the number of steps per day (adding them together, using sum
), the other is aggregating the different days into location level averages, using mean
.
step_count %>%
group_by(date, location) %>%
summarise(sum_steps = sum(count, na.rm = TRUE)) %>%
ungroup %>%
group_by(date) %>%
summarise(avg_steps = mean(sum_steps, na.rm = TRUE))
Upvotes: 1