Plot Device
Plot Device

Reputation: 67

Disaggregate data by date from a higher timeframe to a lower one with lag

I have two large sets of data, one that has value/timestamp pair a on a very shot time frame and the other is the same pair, but on a daily timeframe. What I would like to do is apply the daily value of the prior day to the current day on the lower timeframe. For example:

set 1:

timestamp           value
2019-10-03 23:24:00 2
2019-10-03 23:27:00 2
2019-10-03 23:30:00 3
2019-10-03 23:33:00 4
2019-10-03 23:36:00 4
2019-10-03 23:39:00 5
2019-10-03 23:42:00 5
2019-10-03 23:45:00 5
2019-10-03 23:48:00 5
2019-10-03 23:51:00 5
2019-10-03 23:54:00 5
2019-10-03 23:57:00 8
2019-10-04 0:00:00  8
2019-10-04 0:03:00  8
2019-10-04 0:06:00  8
2019-10-04 0:09:00  5
2019-10-04 0:12:00  5
2019-10-04 0:15:00  8
2019-10-04 0:18:00  6
2019-10-04 0:21:00  6
2019-10-04 0:24:00  6
2019-10-04 0:27:00  6
2019-10-04 0:30:00  7
2019-10-04 0:33:00  7
2019-10-04 0:36:00  7
2019-10-04 0:39:00  7
2019-10-04 0:42:00  7

set 2:

date    value2
2019-10-01  20
2019-10-02  40
2019-10-03  35
2019-10-04  14
2019-10-05  99
2019-10-06  23
2019-10-07  11
2019-10-08  67
2019-10-09  44
2019-10-10  32
2019-10-11  78

Desired Output:

timestamp         value value2
2019-10-03 23:24:00 2   40
2019-10-03 23:27:00 2   40
2019-10-03 23:30:00 3   40
2019-10-03 23:33:00 4   40
2019-10-03 23:36:00 4   40
2019-10-03 23:39:00 5   40
2019-10-03 23:42:00 5   40
2019-10-03 23:45:00 5   40
2019-10-03 23:48:00 5   40
2019-10-03 23:51:00 5   40
2019-10-03 23:54:00 5   40
2019-10-03 23:57:00 8   40
2019-10-04 0:00:00  8   35
2019-10-04 0:03:00  8   35
2019-10-04 0:06:00  8   35
2019-10-04 0:09:00  5   35
2019-10-04 0:12:00  5   35
2019-10-04 0:15:00  8   35
2019-10-04 0:18:00  6   35
2019-10-04 0:21:00  6   35
2019-10-04 0:24:00  6   35
2019-10-04 0:27:00  6   35
2019-10-04 0:30:00  7   35
2019-10-04 0:33:00  7   35
2019-10-04 0:36:00  7   35
2019-10-04 0:39:00  7   35
2019-10-04 0:42:00  7   35

And that would go on and on through the rest of the data set. I have been focused on dplyr, and have tried to get this to work using lubridate too, but I am not sure exactly how to get everything to work. I have also tried to convert the timestamps to different factors, so they would play nice with each other, but I really haven't made progress in any direction.

Upvotes: 0

Views: 50

Answers (2)

prosoitos
prosoitos

Reputation: 7397

If you don't want to bother converting your strings to dates, you can use regular expressions:

df1$date <- sub("(.*) .*", "\\1", df1$timestamp)
result <- merge(df1, df2, by = "date")
result$date <- NULL

Demo with minimal data:

df1 <- data.frame(
  timestamp = c("2019-10-03 23:24:00", "2019-10-03 23:27:00", "2019-10-04 0:15:00"),
  value = c(2, 2, 8)
)

df2 <- data.frame(
  date = c("2019-10-01", "2019-10-03", "2019-10-04"),
  value2 = c(20, 35, 14)
)

df1$date <- sub("(.*) .*", "\\1", df1$timestamp)
result <- merge(df1, df2, by = 'date')
result$date <- NULL

Output:

> result
            timestamp value value2
1 2019-10-03 23:24:00     2     35
2 2019-10-03 23:27:00     2     35
3  2019-10-04 0:15:00     8     14

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389155

You can subtract -1 from date in df1 and join the data with df2 by date.

library(dplyr)
library(lubridate)

df1 %>%
  mutate(timestamp = ymd_hms(timestamp), 
         date = as.Date(timestamp) - 1) %>%
  left_join(df2 %>% mutate(date = ymd(date)), by ='date')

This returns

#             timestamp value       date value2
#1  2019-10-03 23:24:00     2 2019-10-02     40
#2  2019-10-03 23:27:00     2 2019-10-02     40
#3  2019-10-03 23:30:00     3 2019-10-02     40
#4  2019-10-03 23:33:00     4 2019-10-02     40
#5  2019-10-03 23:36:00     4 2019-10-02     40
#6  2019-10-03 23:39:00     5 2019-10-02     40
#7  2019-10-03 23:42:00     5 2019-10-02     40
#8  2019-10-03 23:45:00     5 2019-10-02     40
#9  2019-10-03 23:48:00     5 2019-10-02     40
#10 2019-10-03 23:51:00     5 2019-10-02     40
#11 2019-10-03 23:54:00     5 2019-10-02     40
#12 2019-10-03 23:57:00     8 2019-10-02     40
#13 2019-10-04 00:00:00     8 2019-10-03     35
#14 2019-10-04 00:03:00     8 2019-10-03     35
#15 2019-10-04 00:06:00     8 2019-10-03     35
#16 2019-10-04 00:09:00     5 2019-10-03     35
#17 2019-10-04 00:12:00     5 2019-10-03     35
#18 2019-10-04 00:15:00     8 2019-10-03     35
#19 2019-10-04 00:18:00     6 2019-10-03     35
#20 2019-10-04 00:21:00     6 2019-10-03     35
#21 2019-10-04 00:24:00     6 2019-10-03     35
#22 2019-10-04 00:27:00     6 2019-10-03     35
#23 2019-10-04 00:30:00     7 2019-10-03     35
#24 2019-10-04 00:33:00     7 2019-10-03     35
#25 2019-10-04 00:36:00     7 2019-10-03     35
#26 2019-10-04 00:39:00     7 2019-10-03     35
#27 2019-10-04 00:42:00     7 2019-10-03     35

In base R, you can write the same code as :

df1$timestamp <- as.POSIXct(df1$timestamp, format = '%Y-%m-%d %T', tz = 'UTC')
df1$date <- as.Date(df1$timestamp) - 1
df2$date <- as.Date(df2$date)
merge(df1, df2, by = 'date')

Upvotes: 1

Related Questions