Reputation: 207
I have two data frames with time data in POSIXct format and a corresponding location which I need to match. One dataset has time in a series of 30 minute bins, along with location data.
location datetimes date shark
SS04 2018-03-20 08:00:00 2018-03-20 A
Absent 2018-03-20 08:30:00 2018-03-20 A
Absent 2018-03-20 09:00:00 2018-03-20 A
Absent 2018-03-20 09:30:00 2018-03-20 A
SS04 2018-03-20 10:00:00 2018-03-20 A
Absent 2018-03-20 10:30:00 2018-03-20 A
The second dataset has time data recorded every 2 minutes.
shark depth temperature datetime date
A 49.5 26.2 20/03/2018 08:00 20/03/2018
A 49.5 25.3 20/03/2018 08:02 20/03/2018
A 53.0 24.2 20/03/2018 08:04 20/03/2018
A 39.5 26.5 20/03/2018 08:28 20/03/2018
A 43.0 26.2 20/03/2018 09:10 20/03/2018
A 44.5 26.5 20/03/2018 10:34 20/03/2018
I need to match the time bin (datetimes) from the first dataset to the time data (datetime) in the second, based on the location data, so that all the data in the second dataset corresponding to a bin in the first dataset has the location value assigned to all values over the 30 minute period.
I think I could use data.table but I'm not confident in how to approach this.
I would ideally like to create a dataset such as this, with location from the first dataset added to the second dataset, based on the corresponding time bin from the first dataset.
shark depth temperature datetime date location
A 49.5 26.2 20/03/2018 08:00 20/03/2018 SS04
A 49.5 25.3 20/03/2018 08:02 20/03/2018 SS04
A 53.0 24.2 20/03/2018 08:04 20/03/2018 SS04
A 39.5 26.5 20/03/2018 08:32 20/03/2018 Absent
A 43.0 26.2 20/03/2018 09:10 20/03/2018 Absent
A 44.5 26.5 20/03/2018 10:18 20/03/2018 SS04
Upvotes: 1
Views: 381
Reputation: 28825
data30min$datetimesE <- data30min$datetimes + 30 * 60 #in_seconds
library(sqldf)
sqldf('select d2.*,d30.location
from data2min d2
left join data30min d30
on d2.datetime between d30.datetimes and d30.datetimesE
')
#> shark depth temperature datetime date location
#> 1 A 49.5 26.2 2018-03-20 08:00:00 20/03/2018 SS04
#> 2 A 49.5 25.3 2018-03-20 08:02:00 20/03/2018 SS04
#> 3 A 53.0 24.2 2018-03-20 08:04:00 20/03/2018 SS04
#> 4 A 39.5 26.5 2018-03-20 08:28:00 20/03/2018 SS04
#> 5 A 43.0 26.2 2018-03-20 09:10:00 20/03/2018 Absent
#> 6 A 44.5 26.5 2018-03-20 10:34:00 20/03/2018 Absent
Data:
data2min <- structure(list(shark = c("A", "A", "A", "A", "A", "A"), depth = c(49.5,
49.5, 53, 39.5, 43, 44.5), temperature = c(26.2, 25.3, 24.2,
26.5, 26.2, 26.5), datetime = structure(c(1521547200, 1521547320,
1521547440, 1521548880, 1521551400, 1521556440), class = c("POSIXct",
"POSIXt"), tzone = ""), date = c("20/03/2018", "20/03/2018",
"20/03/2018", "20/03/2018", "20/03/2018", "20/03/2018")), row.names = c(NA,
-6L), class = "data.frame")
data30min <- structure(list(location = c("SS04", "Absent", "Absent", "Absent",
"SS04", "Absent"), datetimes = structure(c(1521547200, 1521549000,
1521550800, 1521552600, 1521554400, 1521556200), class = c("POSIXct",
"POSIXt"), tzone = ""), date = c("2018-03-20", "2018-03-20",
"2018-03-20", "2018-03-20", "2018-03-20", "2018-03-20"), shark = c("A",
"A", "A", "A", "A", "A"), datetimesE = structure(c(1521549000,
1521550800, 1521552600, 1521554400, 1521556200, 1521558000), class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 1
Reputation: 27732
use a data.table non-equi join
sample data
library( data.table)
DT1 <- fread('
location datetimes date shark
SS04 "2018-03-20 08:00:00" 2018-03-20 A
Absent "2018-03-20 08:30:00" 2018-03-20 A
Absent "2018-03-20 09:00:00" 2018-03-20 A
Absent "2018-03-20 09:30:00" 2018-03-20 A
SS04 "2018-03-20 10:00:00" 2018-03-20 A
Absent "2018-03-20 10:30:00" 2018-03-20 A')
DT2 <- fread('
shark depth temperature datetime date
A 49.5 26.2 "20/03/2018 08:00" 20/03/2018
A 49.5 25.3 "20/03/2018 08:02" 20/03/2018
A 53.0 24.2 "20/03/2018 08:04" 20/03/2018
A 39.5 26.5 "20/03/2018 08:28" 20/03/2018
A 43.0 26.2 "20/03/2018 09:10" 20/03/2018
A 44.5 26.5 "20/03/2018 10:34" 20/03/2018
')
DT1[, `:=`( datetimes = as.POSIXct( datetimes, format = "%Y-%m-%d %H:%M:%S" ))]
DT2[, `:=`( datetime = as.POSIXct( datetime, format = "%d/%m/%Y %H:%M" ) )]
code
DT2[ copy(DT1)[, end := datetimes + lubridate::minutes(30)], location := i.location,
on = .( datetime >= datetimes, datetime < end)][]
output
# shark depth temperature datetime date location
# 1: A 49.5 26.2 2018-03-20 08:00:00 20/03/2018 SS04
# 2: A 49.5 25.3 2018-03-20 08:02:00 20/03/2018 SS04
# 3: A 53.0 24.2 2018-03-20 08:04:00 20/03/2018 SS04
# 4: A 39.5 26.5 2018-03-20 08:28:00 20/03/2018 SS04
# 5: A 43.0 26.2 2018-03-20 09:10:00 20/03/2018 Absent
# 6: A 44.5 26.5 2018-03-20 10:34:00 20/03/2018 Absent
Upvotes: 0