Reputation: 8377
I have a measurement (for instance solar radiation) indexed with a datetime variable, at an hourly timestamp. What I want to do is to sum the measurement value for each day of the year, and match this to another source of data also at daily scale (let's say mean outdoor temperature).
Although, the second source of data is already agregated from 8:00am to 8:00am the next day. I know how to summarise my first variable by standard day, but I need to do it from 8 to 8 in order to match both measurements.
An example of my data
set.seed(1L) # to create reproducible data
hourly = data.frame(datetime = seq(from = lubridate::ymd_hm("2017-01-01 01:00"),
length.out = 168, by = "hour"),
value = rpois(168, 10))
daily = data.frame(datetime = seq(from=as.Date("2017-01-01"), length.out = 31, by="day"),
value=rnorm(31))
Upvotes: 0
Views: 306
Reputation: 42592
Expanding my comment into an answer, it's worth to note that the OP has emphasized the words aggregated from 8:00am to 8:00am the next day.
If a 24 hour period is not aligned with midnight, i.e., does not extend from 00:00 to 24:00 but starts and ends sometime during the day, it is ambiguous which date is associated with that period.
We can take either
Just to illustrate the difference:
# timestamps: 9 am, 10pm, 7 am next day
x <- lubridate::ymd_hm(c("2017-09-12 09:00", "2017-09-12 22:00", "2017-09-13 07:00"))
x
[1] "2017-09-12 09:00:00 UTC" "2017-09-12 22:00:00 UTC" "2017-09-13 07:00:00 UTC"
# map timestamps to date on which period starts by shifting back by 8 hours
x + lubridate::hours(-8L)
[1] "2017-09-12 01:00:00 UTC" "2017-09-12 14:00:00 UTC" "2017-09-12 23:00:00 UTC"
# map timestamps to date on which period ends by advancing by 16 hours
x + lubridate::hours(16L)
[1] "2017-09-13 01:00:00 UTC" "2017-09-13 14:00:00 UTC" "2017-09-13 23:00:00 UTC"
As there are no other information, let's assume that the daily
data were mapped onto the day on which the period start.
For grouping, aggregating, and merging data.table
is used:
library(data.table)
# aggregate data by shifted timestamp
setDT(hourly)[, .(sum.value = sum(value)),
by = .(date = as.Date(datetime + lubridate::hours(-8L)))]
date sum.value 1: 2016-12-31 68 2: 2017-01-01 232 3: 2017-01-02 222 4: 2017-01-03 227 5: 2017-01-04 228 6: 2017-01-05 231 7: 2017-01-06 260 8: 2017-01-07 144
Note that the new date
column which is used for grouping and aggregating is created on the fly in the by
parameter (one of the reasons why I prefer data.table
)
Now, the daily
data need to be joined. By chaining this can be combined in one statement:
setDT(hourly)[, .(sum.value = sum(value)),
by = .(date = as.Date(datetime + lubridate::hours(-8L)))][
setDT(daily), on = .(date = datetime), nomatch = 0L]
date sum.value value 1: 2017-01-01 232 -0.5080862 2: 2017-01-02 222 0.5236206 3: 2017-01-03 227 1.0177542 4: 2017-01-04 228 -0.2511646 5: 2017-01-05 231 -1.4299934 6: 2017-01-06 260 1.7091210 7: 2017-01-07 144 1.4350696
The parameter nomatch = 0L
indicate that we want an inner join here.
Upvotes: 1
Reputation: 47350
using dplyr
and just translating the day by substracting 8 hours :
hourly %>% mutate(datetime = as_date(datetime - 8 * hours())) %>%
rbind(daily) %>%
group_by(datetime) %>%
summarize_all(sum) %>%
ungroup%>%
arrange(datetime)
result
A tibble: 32 x 2
datetime value
<date> <dbl>
1 2016-12-31 70.0000000
2 2017-01-01 218.6726454
3 2017-01-02 244.3821258
4 2017-01-03 257.7136326
5 2017-01-04 220.4788443
6 2017-01-05 230.3729744
7 2017-01-06 248.5082639
8 2017-01-07 176.5511818
9 2017-01-08 -0.8307824
10 2017-01-09 -0.6343781
# ... with 22 more rows
Upvotes: 1
Reputation: 3504
You could do it using cut
, e.g.:
library(lubridate)
library(dplyr)
brk = seq(ymd_hm(paste(as.Date(min(hourly$datetime) - days(1)), "08:00"), tz= "UTC"), ymd_hm(paste(as.Date(max(hourly$datetime)+ days(1)), "08:00"), tz= "UTC"), by = "24 hours")
hourly$cut <- ymd_hms(cut.POSIXt(hourly$datetime, breaks = brk))
hourly2 <- hourly %>% group_by(cut) %>% summarize(value = sum(value))
hourly2$cut <- as.Date(hourly2$cut)
names(hourly2) <- names(daily)
comb <- rbind(hourly2, daily) %>% group_by(datetime) %>% summarize(value = sum(value))
datetime value
<date> <dbl>
1 2016-12-31 52.0000000
2 2017-01-01 241.5612137
3 2017-01-02 244.3689032
4 2017-01-03 271.3156334
5 2017-01-04 253.8221333
6 2017-01-05 238.5790170
7 2017-01-06 220.7118064
8 2017-01-07 167.5018586
9 2017-01-08 -0.2962494
10 2017-01-09 0.4126310
... with 22 more rows
Upvotes: 1