Reputation: 67
I have two large sets of data, one that has value/timestamp pair a on a very shot time frame and the other is the same pair, but on a daily timeframe. What I would like to do is apply the daily value of the prior day to the current day on the lower timeframe. For example:
set 1:
timestamp value
2019-10-03 23:24:00 2
2019-10-03 23:27:00 2
2019-10-03 23:30:00 3
2019-10-03 23:33:00 4
2019-10-03 23:36:00 4
2019-10-03 23:39:00 5
2019-10-03 23:42:00 5
2019-10-03 23:45:00 5
2019-10-03 23:48:00 5
2019-10-03 23:51:00 5
2019-10-03 23:54:00 5
2019-10-03 23:57:00 8
2019-10-04 0:00:00 8
2019-10-04 0:03:00 8
2019-10-04 0:06:00 8
2019-10-04 0:09:00 5
2019-10-04 0:12:00 5
2019-10-04 0:15:00 8
2019-10-04 0:18:00 6
2019-10-04 0:21:00 6
2019-10-04 0:24:00 6
2019-10-04 0:27:00 6
2019-10-04 0:30:00 7
2019-10-04 0:33:00 7
2019-10-04 0:36:00 7
2019-10-04 0:39:00 7
2019-10-04 0:42:00 7
set 2:
date value2
2019-10-01 20
2019-10-02 40
2019-10-03 35
2019-10-04 14
2019-10-05 99
2019-10-06 23
2019-10-07 11
2019-10-08 67
2019-10-09 44
2019-10-10 32
2019-10-11 78
Desired Output:
timestamp value value2
2019-10-03 23:24:00 2 40
2019-10-03 23:27:00 2 40
2019-10-03 23:30:00 3 40
2019-10-03 23:33:00 4 40
2019-10-03 23:36:00 4 40
2019-10-03 23:39:00 5 40
2019-10-03 23:42:00 5 40
2019-10-03 23:45:00 5 40
2019-10-03 23:48:00 5 40
2019-10-03 23:51:00 5 40
2019-10-03 23:54:00 5 40
2019-10-03 23:57:00 8 40
2019-10-04 0:00:00 8 35
2019-10-04 0:03:00 8 35
2019-10-04 0:06:00 8 35
2019-10-04 0:09:00 5 35
2019-10-04 0:12:00 5 35
2019-10-04 0:15:00 8 35
2019-10-04 0:18:00 6 35
2019-10-04 0:21:00 6 35
2019-10-04 0:24:00 6 35
2019-10-04 0:27:00 6 35
2019-10-04 0:30:00 7 35
2019-10-04 0:33:00 7 35
2019-10-04 0:36:00 7 35
2019-10-04 0:39:00 7 35
2019-10-04 0:42:00 7 35
And that would go on and on through the rest of the data set. I have been focused on dplyr, and have tried to get this to work using lubridate too, but I am not sure exactly how to get everything to work. I have also tried to convert the timestamps to different factors, so they would play nice with each other, but I really haven't made progress in any direction.
Upvotes: 0
Views: 50
Reputation: 7397
If you don't want to bother converting your strings to dates, you can use regular expressions:
df1$date <- sub("(.*) .*", "\\1", df1$timestamp)
result <- merge(df1, df2, by = "date")
result$date <- NULL
Demo with minimal data:
df1 <- data.frame(
timestamp = c("2019-10-03 23:24:00", "2019-10-03 23:27:00", "2019-10-04 0:15:00"),
value = c(2, 2, 8)
)
df2 <- data.frame(
date = c("2019-10-01", "2019-10-03", "2019-10-04"),
value2 = c(20, 35, 14)
)
df1$date <- sub("(.*) .*", "\\1", df1$timestamp)
result <- merge(df1, df2, by = 'date')
result$date <- NULL
Output:
> result
timestamp value value2
1 2019-10-03 23:24:00 2 35
2 2019-10-03 23:27:00 2 35
3 2019-10-04 0:15:00 8 14
Upvotes: 0
Reputation: 389155
You can subtract -1 from date in df1
and join the data with df2
by date
.
library(dplyr)
library(lubridate)
df1 %>%
mutate(timestamp = ymd_hms(timestamp),
date = as.Date(timestamp) - 1) %>%
left_join(df2 %>% mutate(date = ymd(date)), by ='date')
This returns
# timestamp value date value2
#1 2019-10-03 23:24:00 2 2019-10-02 40
#2 2019-10-03 23:27:00 2 2019-10-02 40
#3 2019-10-03 23:30:00 3 2019-10-02 40
#4 2019-10-03 23:33:00 4 2019-10-02 40
#5 2019-10-03 23:36:00 4 2019-10-02 40
#6 2019-10-03 23:39:00 5 2019-10-02 40
#7 2019-10-03 23:42:00 5 2019-10-02 40
#8 2019-10-03 23:45:00 5 2019-10-02 40
#9 2019-10-03 23:48:00 5 2019-10-02 40
#10 2019-10-03 23:51:00 5 2019-10-02 40
#11 2019-10-03 23:54:00 5 2019-10-02 40
#12 2019-10-03 23:57:00 8 2019-10-02 40
#13 2019-10-04 00:00:00 8 2019-10-03 35
#14 2019-10-04 00:03:00 8 2019-10-03 35
#15 2019-10-04 00:06:00 8 2019-10-03 35
#16 2019-10-04 00:09:00 5 2019-10-03 35
#17 2019-10-04 00:12:00 5 2019-10-03 35
#18 2019-10-04 00:15:00 8 2019-10-03 35
#19 2019-10-04 00:18:00 6 2019-10-03 35
#20 2019-10-04 00:21:00 6 2019-10-03 35
#21 2019-10-04 00:24:00 6 2019-10-03 35
#22 2019-10-04 00:27:00 6 2019-10-03 35
#23 2019-10-04 00:30:00 7 2019-10-03 35
#24 2019-10-04 00:33:00 7 2019-10-03 35
#25 2019-10-04 00:36:00 7 2019-10-03 35
#26 2019-10-04 00:39:00 7 2019-10-03 35
#27 2019-10-04 00:42:00 7 2019-10-03 35
In base R, you can write the same code as :
df1$timestamp <- as.POSIXct(df1$timestamp, format = '%Y-%m-%d %T', tz = 'UTC')
df1$date <- as.Date(df1$timestamp) - 1
df2$date <- as.Date(df2$date)
merge(df1, df2, by = 'date')
Upvotes: 1