Reputation: 1000
Intro:
I would like to aggregate some 5-minute data into 10-minute data. Specifically, I only want to aggregate on the 10-minute marks (00:10:00, 00:20:00, 00:30:00, etc.).
The code below almost achieves this, but the breaks are on the 5 minute mark instead of the 10 minute mark (00:05:00, 00:15:00, 00:25:00). I think dplyr
is using the first row in the dataframe when determining the cutpoints.
Are there any ways to achieve "nice" 10-min breaks using cut {base}
and group_by() {dplyr}
? I would be okay with just removing the first row of data, but I really need the solution to manage many different files, each of which with unique starting points.
Thanks in advance!
Example Code:
date <- c("2017-06-14 14:35:00", "2017-06-14 14:40:00", "2017-06-14 14:45:00", "2017-06-14 14:50:00")
co <- as.numeric(c(5.17,10.07,13.88,13.78))
no <- as.numeric(c(34.98,32.45,31.34,29.09))
no2 <- as.numeric(c(0.00,0.00,0.00,0.00))
o3 <- as.numeric(c(5.17,10.07,13.88,13.78))
data <- data.frame(date, co, no , no2, o3)
data$date <- strptime(data$date, format = "%Y-%m-%d %H:%M")
data$date <- as.POSIXct(data$date)
head(data)
data_10min <- data %>%
group_by(date = cut(date, breaks = "10 min")) %>%
summarize(co = mean(co), no = mean(no), no2 = mean(no2), o3 = mean(o3))
head(data_10min)
Desired Output:
2017-06-14 14:40:00
2017-06-14 14:50:00
Upvotes: 1
Views: 486
Reputation: 949
Just adding 300 seconds to date column during group_by
gets the desired result.
library(magrittr)
library(dplyr)
df_10min <- df %>%
group_by(date = cut(as.POSIXct(x) + 300, breaks = "10 min")) %>%
summarize_each(funs(mean))
df_10min
The result:
# # A tibble: 2 × 5
# date co no no2 o3
# <fctr> <dbl> <dbl> <dbl> <dbl>
# 1 2017-06-14 14:40:00 7.62 33.715 0 7.62
# 2 2017-06-14 14:50:00 13.83 30.215 0 13.83
Upvotes: 1