noob_researcher
noob_researcher

Reputation: 67

group by many columns and calculate mean

I'm working with hourly polutant data measured in various air quality stations. I want to have the daily average for all stations in one DF.

date site a site b site c
2010-01-01 00:00:00 3 2 1
2010-01-01 01:00:00 2 2 2
2010-01-01 02:00:00 1 2 3

and I would like:

date site a site b site c
2010-01-01 2 2 2
daylyavg <- df %>%
  group_by(Month, Day) %>%
  summarize(Avg_a = mean(site a))
Month day Avg_a
1 1 2

Upvotes: 1

Views: 38

Answers (1)

Jon Spring
Jon Spring

Reputation: 66445

Here's an approach using dplyr::across:

library(tidyverse); library(lubridate)

my_data %>%
  mutate(date = ymd_hms(date)) %>%  # in case not already POSIXct date-time
  
  mutate(month = month(date),
         day = day(date)) %>%

  group_by(month, day) %>%
  summarize(across(starts_with("site"), mean,
                   .names = "Avg_{.col}")) %>%
  ungroup()

Result:

  month   day Avg_site.a Avg_site.b Avg_site.c
  <dbl> <int>      <dbl>      <dbl>      <dbl>
1     1     1          2          2          2

Data:

my_data <- data.frame(
        date = c("2010-01-01 00:00:00",
                 "2010-01-01 01:00:00","2010-01-01 02:00:00"),
      site.a = c(3L, 2L, 1L),
      site.b = c(2L, 2L, 2L),
      site.c = c(1L, 2L, 3L)
)

Upvotes: 2

Related Questions