Group by multiple variables and summarise dplyr

Question

I'm trying to average CO2 concentration data every 30 seconds, for each of my sensors:

    head(df)
# A tibble: 6 x 7
# Groups: BinnedTime [1]

  Sensor Date       Time   calCO2 DeviceTime          cuts   BinnedTime         
                                        
1 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
2 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
3 N1     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:03
4 N2     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
5 N3     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
6 N3     2019-02-12 13:24     400 2019-02-12 13:24:05 (0,10] 2019-02-12 13:24:04

I use:

df %>%
  group_by(Sensor)%>%
  group_by(BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
  summarize(Concentration = mean(calCO2))

But it doesn't group by Sensor first, it ignores them and calculates the average over the BinnedTime instead. Any thoughts would be welcomed.

I've read about .dots=c("Sensor","BinnedTime") but this doesn't work.

Note, I haven't created dummy data so you can see exactly what mine looks like, as there seem to be some subtleties with time and date that I can't quite get my head around.

GGAnderson · Accepted Answer

So to summarize the comments by @kath with some improvements to address your follow-on question:

df %>%
    group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
        mutate(Concentration = mean(calCO2)) %>%
    ungroup()

The above will maintain all columns, but duplicate the Concentration calculation for each row of the df. An alternative that would allow you to both roll up and retain more columns of interest is to simply add them to the summarize operation, as illustrated below.

    df %>%
    group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
        summarize(Concentration = mean(calCO2),
                   Date = min(Date),
                   Time = min(Time),
                   StartDeviceTime = min(DeviceTime),
                   EndDeviceTime = max(DeviceTime))

Group by multiple variables and summarise dplyr

Answers (1)

Related Questions