Reputation: 2263
I'm trying to average CO2 concentration data every 30 seconds, for each of my sensors:
head(df)
# A tibble: 6 x 7
# Groups: BinnedTime [1]
Sensor Date Time calCO2 DeviceTime cuts BinnedTime
<fctr> <date> <time> <dbl> <dttm> <fctr> <chr>
1 N1 2019-02-12 13:24 400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
2 N1 2019-02-12 13:24 400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
3 N1 2019-02-12 13:24 400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:03
4 N2 2019-02-12 13:24 400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
5 N3 2019-02-12 13:24 400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
6 N3 2019-02-12 13:24 400 2019-02-12 13:24:05 (0,10] 2019-02-12 13:24:04
I use:
df %>%
group_by(Sensor)%>%
group_by(BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
summarize(Concentration = mean(calCO2))
But it doesn't group by Sensor first, it ignores them and calculates the average over the BinnedTime instead. Any thoughts would be welcomed.
I've read about .dots=c("Sensor","BinnedTime")
but this doesn't work.
Note, I haven't created dummy data so you can see exactly what mine looks like, as there seem to be some subtleties with time and date that I can't quite get my head around.
Upvotes: 1
Views: 2158
Reputation: 2210
So to summarize the comments by @kath with some improvements to address your follow-on question:
df %>%
group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
mutate(Concentration = mean(calCO2)) %>%
ungroup()
The above will maintain all columns, but duplicate the Concentration calculation for each row of the df. An alternative that would allow you to both roll up and retain more columns of interest is to simply add them to the summarize operation, as illustrated below.
df %>%
group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
summarize(Concentration = mean(calCO2),
Date = min(Date),
Time = min(Time),
StartDeviceTime = min(DeviceTime),
EndDeviceTime = max(DeviceTime))
Upvotes: 1