Kee
Kee

Reputation: 163

Using two summarise function in r

 library(lubridate)
 library(tidyverse)
 step_count_raw <- read_csv("data/step-count/step-count.csv",
 locale = locale(tz = "Australia/Melbourne"))
 location <- read_csv("data/step-count/location.csv")
 step_count <- step_count_raw %>% 
 rename_with(~ c("date_time", "date", "count")) %>% 
 left_join(location) %>% 
 mutate(location = replace_na(location, "Melbourne"))
 step_count

 #> # A tibble: 5,448 x 4
 #>    date_time           date       count location 
 #>    <dttm>              <date>     <dbl> <chr>    
 #>  1 2019-01-01 09:00:00 2019-01-01   764 Melbourne
 #>  2 2019-01-01 10:00:00 2019-01-01   913 Melbourne
 #>  3 2019-01-02 00:00:00 2019-01-02     9 Melbourne
 #>  4 2019-01-02 10:00:00 2019-01-02  2910 Melbourne
 #>  5 2019-01-02 11:00:00 2019-01-02  1390 Melbourne
 #>  6 2019-01-02 12:00:00 2019-01-02  1020 Melbourne
 #>  7 2019-01-02 13:00:00 2019-01-02   472 Melbourne
 #>  8 2019-01-02 15:00:00 2019-01-02  1220 Melbourne
 #>  9 2019-01-02 16:00:00 2019-01-02  1670 Melbourne
 #> 10 2019-01-02 17:00:00 2019-01-02  1390 Melbourne
 #> # … with 5,438 more rows

I want to calculate average daily step counts for every location, from step_count. Then end up with a tibble called city_avg_steps.

expected output

 #> # A tibble: 4 x 2
 #>   location      avg_count
 #>   <chr>             <dbl>
 #> 1 Austin            7738.
 #> 2 Denver           12738.
 #> 3 Melbourne         7912.
 #> 4 San Francisco    13990.

My code and output

  city_avg_steps <- step_count%>%group_by(location)%>%summarise(avg_count=mean(count))
  city_avg_steps
  
  # A tibble: 4 x 2
  location      avg_count
  <chr>             <dbl>
  1 Austin             721.
  2 Denver             650.
  3 Melbourne          530.
  4 San Francisco      654.

I have a clue is to calculate daily number first then cumulate the result using two summarise fuction,but not sure how to add.

Upvotes: 0

Views: 102

Answers (1)

Mouad_Seridi
Mouad_Seridi

Reputation: 2716

As @dash2 explains in the comments, from what we understand from your desired output, it requires a two stage aggregation, one to aggregate the number of steps per day (adding them together, using sum), the other is aggregating the different days into location level averages, using mean.

step_count %>% 
  group_by(date, location) %>% 
  summarise(sum_steps = sum(count, na.rm = TRUE)) %>% 
  ungroup %>% 
  group_by(date) %>%
  summarise(avg_steps = mean(sum_steps, na.rm = TRUE))

Upvotes: 1

Related Questions