T X
T X

Reputation: 613

How to extract one-to-one match data which has the same time using R

I have two datasets: a & b. They are measured data, most of them are measured at the same time, while the rest can not be matched. For example:

a<-data.frame(measuretime=c("2010-10-20 11:00:00", "2010-12-15 13:18:00", "2011-02-14 09:00:00", 
                            "2011-03-08 11:52:00", "2012-08-23 22:59:00"), value=c(1.5, 6.3, 0.1, 9.9, 7))
b<-data.frame(measuretime=c("2010-12-15 13:18:00", "2011-02-14 10:30:00", 
                            "2011-03-08 11:52:00", "2011-04-18 12:23:00"), value=c(22, 71, 12, 69))

I want to filter those data whose measured time is the same, so that I can match them and analyze later. That is, the dataset a and b after filtering should be:

a:

  measuretime      value
2010/12/15 13:18    6.3
2011/3/8 11:52      9.9

b:

  measuretime      value
2010/12/15 13:18    22
2011/3/8 11:52      12

After the operation, a and b are measured at the same date: 2010/12/15 13:18 and 2011/3/8 11:52. Could anyone have the method to achive this in R?

Upvotes: 0

Views: 45

Answers (2)

Konrad
Konrad

Reputation: 18585

Notes

I would strongly urge you not comparing timestamps as strings and initially bringing those to a timestamp / date format, as suggested below. The reasons behind it are mostly concerned with:

  • possible errors concerned with timestamps coded in different formats, containing time zones, milliseconds and so forth.
  • By the same token, something that is colloquially defined as "same time" will be expressed differently depending on particularities of the measurement system, etc. In practice, frequently will be interested in comparing events that happen within the same interval. Let's say event B taking place within +/- 5 seconds from event A.

Notes on the solution

  • The solution offered here is simplistic and rounds event to the minute using functions offered within lubridate package. You may consider whether this is suitable or whether it would be wiser to introduce a more sophisticated approach and/or compare events within the same hour / day
  • I would suggest that you have a look at ?lubridate::interval to see whether constructing intervals would be useful in this context
  • In effect you will never compare events that took place at the same time; you will compare events that are recorded as at the same time. Your definition of "same time" depends on the precision of instrument and the granularity of the stored data. If your instrument captures minutes, events that are recorded at the same minute would be considered as happening at the same time. In my view, it's more robust to explicitly capture this in your code.

Code

# Data --------------------------------------------------------------------

a <-
    data.frame(
        measuretime = c(
            "2010-10-20 11:00:00",
            "2010-12-15 13:18:00",
            "2011-02-14 09:00:00",
            "2011-03-08 11:52:00",
            "2012-08-23 22:59:00"
        ),
        value = c(1.5, 6.3, 0.1, 9.9, 7)
    )
b <-
    data.frame(
        measuretime = c(
            "2010-12-15 13:18:00",
            "2011-02-14 10:30:00",
            "2011-03-08 11:52:00",
            "2011-04-18 12:23:00"
        ),
        value = c(22, 71, 12, 69)
    )


# Timestamps --------------------------------------------------------------

suppressPackageStartupMessages(expr = {
    library("tidyverse")
    library("magrittr")
    library("lubridate")
})

# Create timestamp columns

a %<>%
    mutate(measuretime_ts = as_datetime(measuretime))

b %<>%
    mutate(measuretime_ts = as_datetime(measuretime))


# Compare -----------------------------------------------------------------

# Round columns to a minute and leave only columns that 
# are recorded as at the same minute after rounding

a %>%
    mutate(measuretime_ts_rnd = round_date(x = measuretime_ts,
                                           unit = "minute")) %>%
    inner_join(
        y = b %>%
            mutate(measuretime_ts_rnd = round_date(x = measuretime_ts,
                                                   unit = "minute")),
        by = c("measuretime_ts_rnd" = "measuretime_ts_rnd")
    )

Upvotes: 2

clmarquart
clmarquart

Reputation: 4711

If you'd like the result as a new data.frame with only the times that exist in both a and b:

c <- merge(a, b, by = "measuretime")

Or to keep all rows, filled with NA where the measuretime doesn't exist in the other, you can use the all parameter set to TRUE:

d <- merge(a, b, by = "measuretime", all = TRUE)

Upvotes: 2

Related Questions