philiporlando
philiporlando

Reputation: 1000

Grouping a dataframe by time with "nice" breaks using dplyr

Intro:

I would like to aggregate some 5-minute data into 10-minute data. Specifically, I only want to aggregate on the 10-minute marks (00:10:00, 00:20:00, 00:30:00, etc.).

The code below almost achieves this, but the breaks are on the 5 minute mark instead of the 10 minute mark (00:05:00, 00:15:00, 00:25:00). I think dplyr is using the first row in the dataframe when determining the cutpoints.

Are there any ways to achieve "nice" 10-min breaks using cut {base} and group_by() {dplyr}? I would be okay with just removing the first row of data, but I really need the solution to manage many different files, each of which with unique starting points.

Thanks in advance!

Example Code:

date <- c("2017-06-14 14:35:00", "2017-06-14 14:40:00", "2017-06-14 14:45:00", "2017-06-14 14:50:00")
co <- as.numeric(c(5.17,10.07,13.88,13.78))
no <- as.numeric(c(34.98,32.45,31.34,29.09))
no2 <- as.numeric(c(0.00,0.00,0.00,0.00))
o3 <- as.numeric(c(5.17,10.07,13.88,13.78))

data <- data.frame(date, co, no , no2, o3)
data$date <- strptime(data$date, format = "%Y-%m-%d %H:%M")
data$date <- as.POSIXct(data$date)

head(data)

data_10min <- data %>% 
      group_by(date = cut(date, breaks = "10 min")) %>%
      summarize(co = mean(co), no = mean(no), no2 = mean(no2), o3 = mean(o3))

head(data_10min)

Desired Output:

2017-06-14 14:40:00
2017-06-14 14:50:00

Upvotes: 1

Views: 486

Answers (1)

raymkchow
raymkchow

Reputation: 949

Just adding 300 seconds to date column during group_by gets the desired result.

library(magrittr)
library(dplyr)

df_10min <- df %>% 
  group_by(date = cut(as.POSIXct(x) + 300, breaks = "10 min")) %>%
  summarize_each(funs(mean))

df_10min

The result:

# # A tibble: 2 × 5
# date    co     no   no2    o3
# <fctr> <dbl>  <dbl> <dbl> <dbl>
# 1 2017-06-14 14:40:00  7.62 33.715     0  7.62
# 2 2017-06-14 14:50:00 13.83 30.215     0 13.83

Upvotes: 1

Related Questions