How to synchronize data with partly different timestamps in R

Question

I have data from different measurements of the same temporal event stored in two dataframes. The timestamps in the two dataframes are not 100% the same though. I'd like to merge the two dataframes by synchronizing them as well as possible.

This is df1 (reproducible data below):

df1
       rec_time   gsr
1  00:00:01.902 2.513
2  00:00:01.905 2.513
3  00:00:01.907 2.518
4  00:00:01.909 2.524
5  00:00:01.913 2.532
6  00:00:01.916 2.499
7  00:00:01.918 2.513
8  00:00:01.922 2.500
9  00:00:01.927 2.513
10 00:00:01.930 2.513
11 00:00:01.989 2.513
12 00:00:01.994 2.513
13 00:00:01.996 2.513
14 00:00:01.999 2.513
15 00:00:02.004 2.513

This is df2 (reproducible data below):

df2
    Start_Time AOI
1 00:00:01.905 Iso
2 00:00:01.914 Kat
3 00:00:01.925 Iso
4 00:00:01.930 Iso
5 00:00:01.991 Kat

As can be seen, sometimes the timestamps (rec_time in df1 and Start_Timein df2) show the same time; more often though there are differences. Now for the merge, where there are differences, I'd like to put the rows from df2 on that row in df1 where the time difference is smallest. How can this synchronization be achieved?

The expected result is this:

df3
       rec_time   gsr   Start_Time  AOI
1  00:00:01.902 2.513          
2  00:00:01.905 2.513 00:00:01.905  Iso   # same time
3  00:00:01.907 2.518          
4  00:00:01.909 2.524          
5  00:00:01.913 2.532 00:00:01.914  Kat   # different time
6  00:00:01.916 2.499          
7  00:00:01.918 2.513          
8  00:00:01.922 2.500          
9  00:00:01.927 2.513 00:00:01.925  Iso   # different time
10 00:00:01.930 2.513 00:00:01.930  Iso   # same time
11 00:00:01.989 2.513          
12 00:00:01.994 2.513 00:00:01.991  Kat   # different time
13 00:00:01.996 2.513          
14 00:00:01.999 2.513          
15 00:00:02.004 2.513

What I've tried so far is meager:

merge(df1, df2, by.x = "rec_time", by.y = "Start_Time", all.x = T)

This obviously only catches those rows that have identical timestamps.

Reproducible data:

df1 <- dput(gsr)
structure(list(rec_time = structure(1:15, .Label = c("00:00:01.902", 
"00:00:01.905", "00:00:01.907", "00:00:01.909", "00:00:01.913", 
"00:00:01.916", "00:00:01.918", "00:00:01.922", "00:00:01.927", 
"00:00:01.930", "00:00:01.989", "00:00:01.994", "00:00:01.996", 
"00:00:01.999", "00:00:02.004"), class = "factor"), gsr = c(2.513, 
2.513, 2.518, 2.524, 2.532, 2.499, 2.513, 2.5, 2.513, 2.513, 
2.513, 2.513, 2.513, 2.513, 2.513)), class = "data.frame", row.names = c(NA, 
-15L))

df2 <- dput(et)
structure(list(Start_Time = structure(1:5, .Label = c("00:00:01.905", 
"00:00:01.914", "00:00:01.925", "00:00:01.930", "00:00:01.991"
), class = "factor"), AOI = structure(c(1L, 2L, 1L, 1L, 2L), .Label = c("Iso", 
"Kat"), class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

Allan Cameron · Accepted Answer

I think it would be easier to work with the data if we first converted the timestamps to milliseconds:

library(dplyr)

timestamp_to_ms <- function(t) {
  sapply(strsplit(as.character(t), ":"), function(x) {
    1000 * sum(c(3600, 60, 1) * as.numeric(x))
  })
}

df1$ms_time <- timestamp_to_ms(df1$rec_time)
df2$ms_time <- timestamp_to_ms(df2$Start_Time)

Now that we have the time in milliseconds in both data frames, it is easy to find the nearest matching timestamps and record them in df2:

df2$ms_time <- sapply(df2$ms_time, function(x) {
  df1$ms_time[which.min(abs(x - df1$ms_time))]
})

This then allows a standard left_join:

df3 <- left_join(df1, df2, by = "ms_time") %>% 
         select(-ms_time)

df3
#>        rec_time   gsr   Start_Time  AOI
#> 1  00:00:01.902 2.513          
#> 2  00:00:01.905 2.513 00:00:01.905  Iso
#> 3  00:00:01.907 2.518          
#> 4  00:00:01.909 2.524          
#> 5  00:00:01.913 2.532 00:00:01.914  Kat
#> 6  00:00:01.916 2.499          
#> 7  00:00:01.918 2.513          
#> 8  00:00:01.922 2.500          
#> 9  00:00:01.927 2.513 00:00:01.925  Iso
#> 10 00:00:01.930 2.513 00:00:01.930  Iso
#> 11 00:00:01.989 2.513 00:00:01.991  Kat
#> 12 00:00:01.994 2.513          
#> 13 00:00:01.996 2.513          
#> 14 00:00:01.999 2.513          
#> 15 00:00:02.004 2.513

^{Created on 2020-09-17 by the reprex package (v0.3.0)}

How to synchronize data with partly different timestamps in R

Answers (2)

Related Questions