agenis
agenis

Reputation: 8377

merge two daily time series after summarising on shifted hours

I have a measurement (for instance solar radiation) indexed with a datetime variable, at an hourly timestamp. What I want to do is to sum the measurement value for each day of the year, and match this to another source of data also at daily scale (let's say mean outdoor temperature).

Although, the second source of data is already agregated from 8:00am to 8:00am the next day. I know how to summarise my first variable by standard day, but I need to do it from 8 to 8 in order to match both measurements.

An example of my data

set.seed(1L) # to create reproducible data
hourly = data.frame(datetime = seq(from = lubridate::ymd_hm("2017-01-01 01:00"), 
                                   length.out = 168, by = "hour"),
                    value = rpois(168, 10))
daily = data.frame(datetime = seq(from=as.Date("2017-01-01"), length.out = 31, by="day"),
                   value=rnorm(31))

Upvotes: 0

Views: 306

Answers (3)

Uwe
Uwe

Reputation: 42592

Expanding my comment into an answer, it's worth to note that the OP has emphasized the words aggregated from 8:00am to 8:00am the next day.

Mapping not aligned 24 hour periods to dates

If a 24 hour period is not aligned with midnight, i.e., does not extend from 00:00 to 24:00 but starts and ends sometime during the day, it is ambiguous which date is associated with that period.

We can take either

  1. the date of the day on which the period starts,
  2. the date of the day on which the period ends, or
  3. the date of the day which contains the majority of hours of the period.

Just to illustrate the difference:

# timestamps: 9 am, 10pm, 7 am next day 
x <- lubridate::ymd_hm(c("2017-09-12 09:00", "2017-09-12 22:00", "2017-09-13 07:00"))
x
[1] "2017-09-12 09:00:00 UTC" "2017-09-12 22:00:00 UTC" "2017-09-13 07:00:00 UTC"
# map timestamps to date on which period starts by shifting back by 8 hours
x + lubridate::hours(-8L)
[1] "2017-09-12 01:00:00 UTC" "2017-09-12 14:00:00 UTC" "2017-09-12 23:00:00 UTC"
# map timestamps to date on which period ends by advancing by 16 hours
x + lubridate::hours(16L)
[1] "2017-09-13 01:00:00 UTC" "2017-09-13 14:00:00 UTC" "2017-09-13 23:00:00 UTC"

As there are no other information, let's assume that the daily data were mapped onto the day on which the period start.

Aggregating and merging

For grouping, aggregating, and merging data.table is used:

library(data.table)
# aggregate data by shifted timestamp
setDT(hourly)[, .(sum.value = sum(value)), 
              by = .(date = as.Date(datetime + lubridate::hours(-8L)))]
         date sum.value
1: 2016-12-31        68
2: 2017-01-01       232
3: 2017-01-02       222
4: 2017-01-03       227
5: 2017-01-04       228
6: 2017-01-05       231
7: 2017-01-06       260
8: 2017-01-07       144

Note that the new date column which is used for grouping and aggregating is created on the fly in the by parameter (one of the reasons why I prefer data.table)

Now, the daily data need to be joined. By chaining this can be combined in one statement:

setDT(hourly)[, .(sum.value = sum(value)), 
              by = .(date = as.Date(datetime + lubridate::hours(-8L)))][
                setDT(daily), on = .(date = datetime), nomatch = 0L]
         date sum.value      value
1: 2017-01-01       232 -0.5080862
2: 2017-01-02       222  0.5236206
3: 2017-01-03       227  1.0177542
4: 2017-01-04       228 -0.2511646
5: 2017-01-05       231 -1.4299934
6: 2017-01-06       260  1.7091210
7: 2017-01-07       144  1.4350696

The parameter nomatch = 0L indicate that we want an inner join here.

Upvotes: 1

moodymudskipper
moodymudskipper

Reputation: 47350

using dplyr and just translating the day by substracting 8 hours :

hourly %>% mutate(datetime = as_date(datetime - 8 * hours())) %>%
  rbind(daily) %>%
  group_by(datetime)  %>%
  summarize_all(sum) %>%
  ungroup%>%
  arrange(datetime) 

result

 A tibble: 32 x 2
     datetime       value
       <date>       <dbl>
 1 2016-12-31  70.0000000
 2 2017-01-01 218.6726454
 3 2017-01-02 244.3821258
 4 2017-01-03 257.7136326
 5 2017-01-04 220.4788443
 6 2017-01-05 230.3729744
 7 2017-01-06 248.5082639
 8 2017-01-07 176.5511818
 9 2017-01-08  -0.8307824
10 2017-01-09  -0.6343781
# ... with 22 more rows

Upvotes: 1

Edgar Santos
Edgar Santos

Reputation: 3504

You could do it using cut, e.g.:

library(lubridate)
library(dplyr)
brk = seq(ymd_hm(paste(as.Date(min(hourly$datetime) - days(1)), "08:00"), tz= "UTC"), ymd_hm(paste(as.Date(max(hourly$datetime)+ days(1)), "08:00"), tz= "UTC"), by = "24 hours")
hourly$cut <- ymd_hms(cut.POSIXt(hourly$datetime, breaks = brk))
hourly2 <- hourly %>% group_by(cut) %>% summarize(value = sum(value)) 
hourly2$cut <- as.Date(hourly2$cut)
names(hourly2) <- names(daily)
comb <- rbind(hourly2, daily) %>% group_by(datetime) %>% summarize(value = sum(value))

     datetime       value
       <date>       <dbl>
 1 2016-12-31  52.0000000
 2 2017-01-01 241.5612137
 3 2017-01-02 244.3689032
 4 2017-01-03 271.3156334
 5 2017-01-04 253.8221333
 6 2017-01-05 238.5790170
 7 2017-01-06 220.7118064
 8 2017-01-07 167.5018586
 9 2017-01-08  -0.2962494
10 2017-01-09   0.4126310
 ... with 22 more rows

Upvotes: 1

Related Questions