Reputation: 613
I have two datasets: a & b. They are measured data, most of them are measured at the same time, while the rest can not be matched. For example:
a<-data.frame(measuretime=c("2010-10-20 11:00:00", "2010-12-15 13:18:00", "2011-02-14 09:00:00",
"2011-03-08 11:52:00", "2012-08-23 22:59:00"), value=c(1.5, 6.3, 0.1, 9.9, 7))
b<-data.frame(measuretime=c("2010-12-15 13:18:00", "2011-02-14 10:30:00",
"2011-03-08 11:52:00", "2011-04-18 12:23:00"), value=c(22, 71, 12, 69))
I want to filter those data whose measured time is the same, so that I can match them and analyze later. That is, the dataset a and b after filtering should be:
a:
measuretime value
2010/12/15 13:18 6.3
2011/3/8 11:52 9.9
b:
measuretime value
2010/12/15 13:18 22
2011/3/8 11:52 12
After the operation, a and b are measured at the same date: 2010/12/15 13:18 and 2011/3/8 11:52. Could anyone have the method to achive this in R?
Upvotes: 0
Views: 45
Reputation: 18585
I would strongly urge you not comparing timestamps as strings and initially bringing those to a timestamp / date format, as suggested below. The reasons behind it are mostly concerned with:
B
taking place within +/- 5 seconds
from event A
. lubridate
package. You may consider whether this is suitable or whether it would be wiser to introduce a more sophisticated approach and/or compare events within the same hour / day?lubridate::interval
to see whether constructing intervals would be useful in this context# Data --------------------------------------------------------------------
a <-
data.frame(
measuretime = c(
"2010-10-20 11:00:00",
"2010-12-15 13:18:00",
"2011-02-14 09:00:00",
"2011-03-08 11:52:00",
"2012-08-23 22:59:00"
),
value = c(1.5, 6.3, 0.1, 9.9, 7)
)
b <-
data.frame(
measuretime = c(
"2010-12-15 13:18:00",
"2011-02-14 10:30:00",
"2011-03-08 11:52:00",
"2011-04-18 12:23:00"
),
value = c(22, 71, 12, 69)
)
# Timestamps --------------------------------------------------------------
suppressPackageStartupMessages(expr = {
library("tidyverse")
library("magrittr")
library("lubridate")
})
# Create timestamp columns
a %<>%
mutate(measuretime_ts = as_datetime(measuretime))
b %<>%
mutate(measuretime_ts = as_datetime(measuretime))
# Compare -----------------------------------------------------------------
# Round columns to a minute and leave only columns that
# are recorded as at the same minute after rounding
a %>%
mutate(measuretime_ts_rnd = round_date(x = measuretime_ts,
unit = "minute")) %>%
inner_join(
y = b %>%
mutate(measuretime_ts_rnd = round_date(x = measuretime_ts,
unit = "minute")),
by = c("measuretime_ts_rnd" = "measuretime_ts_rnd")
)
Upvotes: 2
Reputation: 4711
If you'd like the result as a new data.frame with only the times that exist in both a
and b
:
c <- merge(a, b, by = "measuretime")
Or to keep all rows, filled with NA
where the measuretime
doesn't exist in the other, you can use the all
parameter set to TRUE
:
d <- merge(a, b, by = "measuretime", all = TRUE)
Upvotes: 2