Sorath Abbasi
Sorath Abbasi

Reputation: 53

Aggregating data at 15 minutes interval based on date and hour in R

I have data as follows

Time <- c("2021-08-30 7:24","2021-08-30 7:30","2021-08-30 7:54","2021-08-30 8:16","2021-08-30 8:27","2021-08-30 8:22","2021-08-31 2:39","2021-08-31 2:44","2021-08-31 2:50","2021-08-31 2:56","2021-08-31 7:42","2021-08-31 7:45","2021-08-31 7:50","2021-08-31 6:02")
Distance_m <- c(162,162,162,162,162,162,162,157,150,137,122,102,78,42)
df <- data.frame(Time, Distance_m)
df
              Time Distance_m
1  2021-08-30 7:24        162
2  2021-08-30 7:30        162
3  2021-08-30 7:54        162
4  2021-08-30 8:16        162
5  2021-08-30 8:27        162
6  2021-08-30 8:22        162
7  2021-08-31 2:39        162
8  2021-08-31 2:44        157
9  2021-08-31 2:50        150
10 2021-08-31 2:56        137
11 2021-08-31 7:42        122
12 2021-08-31 7:45        102
13 2021-08-31 7:50         78
14 2021-08-31 6:02         42

I Want to sum the Distance_m based on 15 minutes intervals based on date and hour.

I am expecting the output as follows

Date    Hour    Time    Distance_m
2021-08-30  7   54  486
2021-08-30  8   30  486
2021-08-31  2   56  606
2021-08-31  6   2   344

So far I have tried

df <- tidyr::separate(df, Time, c("Date", "Time"), sep = " ")
df1<- df %>%
  mutate(Time = hm(Time)) %>%
  mutate(ttt= (lubridate::minute(Time) + lubridate::hour(Time) * 60)) %>%
  mutate(tt = floor(ttt/15) ) %>%
  group_by(tt) %>%
  summarize(Date = last(Date),Time = last(Time), Distance_m = sum(Distance_m))

But the output is a bit messy. I am hoping to find an efficient way as I am dealing with a huge data.

Thank you

Upvotes: 0

Views: 1291

Answers (3)

Merijn van Tilborg
Merijn van Tilborg

Reputation: 5887

You probably wonder the 1900 part, this is because when facetting ggplot still has the whole dates in mind, so you do not get them stacked nicely by the hour. When facetting it is also hard to give a start and an end for limits as they fall in a different day. An alternative is splitting as you suggested in dates and hours but that makes you less flexible and lose your timeline imo.

Time <- c("2021-08-30 7:24","2021-08-30 7:30","2021-08-30 7:54","2021-08-30 8:16","2021-08-30 8:27","2021-08-30 8:22","2021-08-31 2:39","2021-08-31 2:44","2021-08-31 2:50","2021-08-31 2:56","2021-08-31 7:42","2021-08-31 7:45","2021-08-31 7:50","2021-08-31 6:02")
Distance_m <- c(162,162,162,162,162,162,162,157,150,137,122,102,78,42)
df <- data.frame(Time, Distance_m)

library(data.table)
setDT(df)

df[, Time := ymd_hm(Time)]
df[, floor_date := lubridate::floor_date(Time, "15 minutes")]
df <- df[, .(Distance_m_sum = sum(Distance_m)), by = floor_date]

ggplot(df, aes(x= ymd_hms(paste("1900-01-01", str_sub(df$floor_date, 12))), y = Distance_m_sum, group = as.Date(floor_date))) + geom_line(size=1) + geom_point(size=3) +
  facet_wrap(as.Date(floor_date) ~ ., ncol = 1) + 
  labs(title = "Daily Distance_m") +
  expand_limits(x = c(ymd_h(1900010100), ymd_h(1900010200))) +
  scale_x_datetime(date_breaks = "60 min", date_minor_breaks = "15 min", date_labels = "%H:%M", expand = c(0,0))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388797

Base R option using cut to divide data in 15 minutes interval and aggregate to summarise the data.

df$Time <- as.POSIXct(df$Time, format = '%Y-%m-%d %H:%M', tz = 'UTC')
aggregate(Distance_m~Time_cut, transform(df, Time_cut = cut(Time, '15 mins')), sum)

#             Time_cut Distance_m
#1 2021-08-30 07:24:00        324
#2 2021-08-30 07:54:00        162
#3 2021-08-30 08:09:00        324
#4 2021-08-30 08:24:00        162
#5 2021-08-31 02:39:00        469
#6 2021-08-31 02:54:00        137
#7 2021-08-31 05:54:00         42
#8 2021-08-31 07:39:00        302

Upvotes: 1

Merijn van Tilborg
Merijn van Tilborg

Reputation: 5887

Not exactly giving your expected results though, but perhaps usable. You can see if this fits your needs.

library(data.table)
setDT(df)

df[, Time := ymd_hm(Time)]
df[, groups := lubridate::round_date(Time, "15 minutes")]
df[, .(Distance_m_sum = sum(Distance_m)), by = groups]

               groups Distance_m_sum
1: 2021-08-30 07:30:00            324
2: 2021-08-30 08:00:00            162
3: 2021-08-30 08:15:00            324
4: 2021-08-30 08:30:00            162
5: 2021-08-31 02:45:00            469
6: 2021-08-31 03:00:00            137
7: 2021-08-31 07:45:00            302
8: 2021-08-31 06:00:00             42

More extended example

You have to define your quarters I think, there are with the lubridate approach three options, round_date, floor_date and ceiling_date. Rethinking my own example I would pick floor_date as 2021-08-30 7:24 falls in the 7:15-7:30 group. To see all variants:

library(data.table)
setDT(df)

df[, Time := ymd_hm(Time)]
df[, round_date := lubridate::round_date(Time, "15 minutes")]
df[, floor_date := lubridate::floor_date(Time, "15 minutes")]
df[, ceiling_date := lubridate::ceiling_date(Time, "15 minutes")]

df[, .(Distance_m_sum = sum(Distance_m)), by = round_date]
            round_date Distance_m_sum
1: 2021-08-30 07:30:00            324
2: 2021-08-30 08:00:00            162
3: 2021-08-30 08:15:00            324
4: 2021-08-30 08:30:00            162
5: 2021-08-31 02:45:00            469
6: 2021-08-31 03:00:00            137
7: 2021-08-31 07:45:00            302
8: 2021-08-31 06:00:00             42

df[, .(Distance_m_sum = sum(Distance_m)), by = floor_date]
        floor_date Distance_m_sum
1: 2021-08-30 07:15:00            162
2: 2021-08-30 07:30:00            162
3: 2021-08-30 07:45:00            162
4: 2021-08-30 08:15:00            486
5: 2021-08-31 02:30:00            319
6: 2021-08-31 02:45:00            287
7: 2021-08-31 07:30:00            122
8: 2021-08-31 07:45:00            180
9: 2021-08-31 06:00:00             42

df[, .(Distance_m_sum = sum(Distance_m)), by = ceiling_date]
          ceiling_date Distance_m_sum
1: 2021-08-30 07:30:00            324
2: 2021-08-30 08:00:00            162
3: 2021-08-30 08:30:00            486
4: 2021-08-31 02:45:00            319
5: 2021-08-31 03:00:00            287
6: 2021-08-31 07:45:00            224
7: 2021-08-31 08:00:00             78
8: 2021-08-31 06:15:00             42

Upvotes: 3

Related Questions