Fionn
Fionn

Reputation: 207

Merge dataframes when timestamp of one is between another one's datetime intervals

I have two data frames with time data in POSIXct format and a corresponding location which I need to match. One dataset has time in a series of 30 minute bins, along with location data.

location   datetimes        date       shark
SS04   2018-03-20 08:00:00 2018-03-20     A
Absent 2018-03-20 08:30:00 2018-03-20     A
Absent 2018-03-20 09:00:00 2018-03-20     A
Absent 2018-03-20 09:30:00 2018-03-20     A
SS04   2018-03-20 10:00:00 2018-03-20     A
Absent 2018-03-20 10:30:00 2018-03-20     A

The second dataset has time data recorded every 2 minutes.

shark       depth     temperature   datetime       date
A            49.5        26.2   20/03/2018 08:00 20/03/2018
A            49.5        25.3   20/03/2018 08:02 20/03/2018
A            53.0        24.2   20/03/2018 08:04 20/03/2018
A            39.5        26.5   20/03/2018 08:28 20/03/2018
A            43.0        26.2   20/03/2018 09:10 20/03/2018
A            44.5        26.5   20/03/2018 10:34 20/03/2018

I need to match the time bin (datetimes) from the first dataset to the time data (datetime) in the second, based on the location data, so that all the data in the second dataset corresponding to a bin in the first dataset has the location value assigned to all values over the 30 minute period.

I think I could use data.table but I'm not confident in how to approach this.

I would ideally like to create a dataset such as this, with location from the first dataset added to the second dataset, based on the corresponding time bin from the first dataset.

shark depth temperature   datetime    date      location
A     49.5  26.2   20/03/2018 08:00 20/03/2018    SS04
A     49.5  25.3   20/03/2018 08:02 20/03/2018    SS04
A     53.0  24.2   20/03/2018 08:04 20/03/2018    SS04
A     39.5  26.5   20/03/2018 08:32 20/03/2018    Absent
A     43.0  26.2   20/03/2018 09:10 20/03/2018    Absent
A     44.5  26.5   20/03/2018 10:18 20/03/2018    SS04

Upvotes: 1

Views: 381

Answers (2)

M--
M--

Reputation: 28825

data30min$datetimesE <- data30min$datetimes + 30 * 60 #in_seconds

library(sqldf)

sqldf('select d2.*,d30.location
           from data2min d2
           left join data30min d30
             on d2.datetime between d30.datetimes and d30.datetimesE
      ')

#>   shark depth temperature            datetime       date location
#> 1     A  49.5        26.2 2018-03-20 08:00:00 20/03/2018     SS04
#> 2     A  49.5        25.3 2018-03-20 08:02:00 20/03/2018     SS04
#> 3     A  53.0        24.2 2018-03-20 08:04:00 20/03/2018     SS04
#> 4     A  39.5        26.5 2018-03-20 08:28:00 20/03/2018     SS04
#> 5     A  43.0        26.2 2018-03-20 09:10:00 20/03/2018   Absent
#> 6     A  44.5        26.5 2018-03-20 10:34:00 20/03/2018   Absent

Data:

data2min <- structure(list(shark = c("A", "A", "A", "A", "A", "A"), depth = c(49.5, 
49.5, 53, 39.5, 43, 44.5), temperature = c(26.2, 25.3, 24.2, 
26.5, 26.2, 26.5), datetime = structure(c(1521547200, 1521547320, 
1521547440, 1521548880, 1521551400, 1521556440), class = c("POSIXct", 
"POSIXt"), tzone = ""), date = c("20/03/2018", "20/03/2018", 
"20/03/2018", "20/03/2018", "20/03/2018", "20/03/2018")), row.names = c(NA, 
-6L), class = "data.frame")

data30min <- structure(list(location = c("SS04", "Absent", "Absent", "Absent", 
"SS04", "Absent"), datetimes = structure(c(1521547200, 1521549000, 
1521550800, 1521552600, 1521554400, 1521556200), class = c("POSIXct", 
"POSIXt"), tzone = ""), date = c("2018-03-20", "2018-03-20", 
"2018-03-20", "2018-03-20", "2018-03-20", "2018-03-20"), shark = c("A", 
"A", "A", "A", "A", "A"), datetimesE = structure(c(1521549000, 
1521550800, 1521552600, 1521554400, 1521556200, 1521558000), class = c("POSIXct", 
"POSIXt"))), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 1

Wimpel
Wimpel

Reputation: 27732

use a data.table non-equi join

sample data

library( data.table)

DT1 <- fread('
location   datetimes        date       shark
SS04   "2018-03-20 08:00:00" 2018-03-20     A
Absent "2018-03-20 08:30:00" 2018-03-20     A
Absent "2018-03-20 09:00:00" 2018-03-20     A
Absent "2018-03-20 09:30:00" 2018-03-20     A
SS04   "2018-03-20 10:00:00" 2018-03-20     A
Absent "2018-03-20 10:30:00" 2018-03-20     A')

DT2 <- fread('
shark       depth     temperature   datetime       date
A            49.5        26.2   "20/03/2018 08:00" 20/03/2018
A            49.5        25.3   "20/03/2018 08:02" 20/03/2018
A            53.0        24.2   "20/03/2018 08:04" 20/03/2018
A            39.5        26.5   "20/03/2018 08:28" 20/03/2018
A            43.0        26.2   "20/03/2018 09:10" 20/03/2018
A            44.5        26.5   "20/03/2018 10:34" 20/03/2018
')

DT1[, `:=`( datetimes = as.POSIXct( datetimes, format = "%Y-%m-%d %H:%M:%S" ))]
DT2[, `:=`( datetime = as.POSIXct( datetime, format = "%d/%m/%Y %H:%M" ) )]

code

DT2[ copy(DT1)[, end := datetimes + lubridate::minutes(30)], location := i.location, 
     on = .( datetime >= datetimes, datetime < end)][]

output

#    shark depth temperature            datetime       date location
# 1:     A  49.5        26.2 2018-03-20 08:00:00 20/03/2018     SS04
# 2:     A  49.5        25.3 2018-03-20 08:02:00 20/03/2018     SS04
# 3:     A  53.0        24.2 2018-03-20 08:04:00 20/03/2018     SS04
# 4:     A  39.5        26.5 2018-03-20 08:28:00 20/03/2018     SS04
# 5:     A  43.0        26.2 2018-03-20 09:10:00 20/03/2018   Absent
# 6:     A  44.5        26.5 2018-03-20 10:34:00 20/03/2018   Absent

Upvotes: 0

Related Questions