Reputation: 53
I have data as follows
Time <- c("2021-08-30 7:24","2021-08-30 7:30","2021-08-30 7:54","2021-08-30 8:16","2021-08-30 8:27","2021-08-30 8:22","2021-08-31 2:39","2021-08-31 2:44","2021-08-31 2:50","2021-08-31 2:56","2021-08-31 7:42","2021-08-31 7:45","2021-08-31 7:50","2021-08-31 6:02")
Distance_m <- c(162,162,162,162,162,162,162,157,150,137,122,102,78,42)
df <- data.frame(Time, Distance_m)
df
Time Distance_m
1 2021-08-30 7:24 162
2 2021-08-30 7:30 162
3 2021-08-30 7:54 162
4 2021-08-30 8:16 162
5 2021-08-30 8:27 162
6 2021-08-30 8:22 162
7 2021-08-31 2:39 162
8 2021-08-31 2:44 157
9 2021-08-31 2:50 150
10 2021-08-31 2:56 137
11 2021-08-31 7:42 122
12 2021-08-31 7:45 102
13 2021-08-31 7:50 78
14 2021-08-31 6:02 42
I Want to sum the Distance_m based on 15 minutes intervals based on date and hour.
I am expecting the output as follows
Date Hour Time Distance_m
2021-08-30 7 54 486
2021-08-30 8 30 486
2021-08-31 2 56 606
2021-08-31 6 2 344
So far I have tried
df <- tidyr::separate(df, Time, c("Date", "Time"), sep = " ")
df1<- df %>%
mutate(Time = hm(Time)) %>%
mutate(ttt= (lubridate::minute(Time) + lubridate::hour(Time) * 60)) %>%
mutate(tt = floor(ttt/15) ) %>%
group_by(tt) %>%
summarize(Date = last(Date),Time = last(Time), Distance_m = sum(Distance_m))
But the output is a bit messy. I am hoping to find an efficient way as I am dealing with a huge data.
Thank you
Upvotes: 0
Views: 1291
Reputation: 5887
You probably wonder the 1900 part, this is because when facetting ggplot still has the whole dates in mind, so you do not get them stacked nicely by the hour. When facetting it is also hard to give a start and an end for limits as they fall in a different day. An alternative is splitting as you suggested in dates and hours but that makes you less flexible and lose your timeline imo.
Time <- c("2021-08-30 7:24","2021-08-30 7:30","2021-08-30 7:54","2021-08-30 8:16","2021-08-30 8:27","2021-08-30 8:22","2021-08-31 2:39","2021-08-31 2:44","2021-08-31 2:50","2021-08-31 2:56","2021-08-31 7:42","2021-08-31 7:45","2021-08-31 7:50","2021-08-31 6:02")
Distance_m <- c(162,162,162,162,162,162,162,157,150,137,122,102,78,42)
df <- data.frame(Time, Distance_m)
library(data.table)
setDT(df)
df[, Time := ymd_hm(Time)]
df[, floor_date := lubridate::floor_date(Time, "15 minutes")]
df <- df[, .(Distance_m_sum = sum(Distance_m)), by = floor_date]
ggplot(df, aes(x= ymd_hms(paste("1900-01-01", str_sub(df$floor_date, 12))), y = Distance_m_sum, group = as.Date(floor_date))) + geom_line(size=1) + geom_point(size=3) +
facet_wrap(as.Date(floor_date) ~ ., ncol = 1) +
labs(title = "Daily Distance_m") +
expand_limits(x = c(ymd_h(1900010100), ymd_h(1900010200))) +
scale_x_datetime(date_breaks = "60 min", date_minor_breaks = "15 min", date_labels = "%H:%M", expand = c(0,0))
Upvotes: 1
Reputation: 388797
Base R option using cut
to divide data in 15 minutes interval and aggregate
to summarise the data.
df$Time <- as.POSIXct(df$Time, format = '%Y-%m-%d %H:%M', tz = 'UTC')
aggregate(Distance_m~Time_cut, transform(df, Time_cut = cut(Time, '15 mins')), sum)
# Time_cut Distance_m
#1 2021-08-30 07:24:00 324
#2 2021-08-30 07:54:00 162
#3 2021-08-30 08:09:00 324
#4 2021-08-30 08:24:00 162
#5 2021-08-31 02:39:00 469
#6 2021-08-31 02:54:00 137
#7 2021-08-31 05:54:00 42
#8 2021-08-31 07:39:00 302
Upvotes: 1
Reputation: 5887
Not exactly giving your expected results though, but perhaps usable. You can see if this fits your needs.
library(data.table)
setDT(df)
df[, Time := ymd_hm(Time)]
df[, groups := lubridate::round_date(Time, "15 minutes")]
df[, .(Distance_m_sum = sum(Distance_m)), by = groups]
groups Distance_m_sum
1: 2021-08-30 07:30:00 324
2: 2021-08-30 08:00:00 162
3: 2021-08-30 08:15:00 324
4: 2021-08-30 08:30:00 162
5: 2021-08-31 02:45:00 469
6: 2021-08-31 03:00:00 137
7: 2021-08-31 07:45:00 302
8: 2021-08-31 06:00:00 42
You have to define your quarters I think, there are with the lubridate approach three options, round_date, floor_date and ceiling_date. Rethinking my own example I would pick floor_date as 2021-08-30 7:24 falls in the 7:15-7:30 group. To see all variants:
library(data.table)
setDT(df)
df[, Time := ymd_hm(Time)]
df[, round_date := lubridate::round_date(Time, "15 minutes")]
df[, floor_date := lubridate::floor_date(Time, "15 minutes")]
df[, ceiling_date := lubridate::ceiling_date(Time, "15 minutes")]
df[, .(Distance_m_sum = sum(Distance_m)), by = round_date]
round_date Distance_m_sum
1: 2021-08-30 07:30:00 324
2: 2021-08-30 08:00:00 162
3: 2021-08-30 08:15:00 324
4: 2021-08-30 08:30:00 162
5: 2021-08-31 02:45:00 469
6: 2021-08-31 03:00:00 137
7: 2021-08-31 07:45:00 302
8: 2021-08-31 06:00:00 42
df[, .(Distance_m_sum = sum(Distance_m)), by = floor_date]
floor_date Distance_m_sum
1: 2021-08-30 07:15:00 162
2: 2021-08-30 07:30:00 162
3: 2021-08-30 07:45:00 162
4: 2021-08-30 08:15:00 486
5: 2021-08-31 02:30:00 319
6: 2021-08-31 02:45:00 287
7: 2021-08-31 07:30:00 122
8: 2021-08-31 07:45:00 180
9: 2021-08-31 06:00:00 42
df[, .(Distance_m_sum = sum(Distance_m)), by = ceiling_date]
ceiling_date Distance_m_sum
1: 2021-08-30 07:30:00 324
2: 2021-08-30 08:00:00 162
3: 2021-08-30 08:30:00 486
4: 2021-08-31 02:45:00 319
5: 2021-08-31 03:00:00 287
6: 2021-08-31 07:45:00 224
7: 2021-08-31 08:00:00 78
8: 2021-08-31 06:15:00 42
Upvotes: 3