Reputation: 2267
I have a 2 dataframes like this:
df1
ID <- c("A","B","A","C","C","B","B","A")
StartDatetime <- c("2015-09-29 00:00:13", "2015-09-29 05:55:50", "2015-09-29 11:45:14", "2015-09-29 15:24:00",
"2015-09-29 17:24:12", "2015-09-29 21:34:31", "2015-09-29 22:22:22", "2015-09-29 23:38:22")
EndDatetime <- c("2015-09-29 00:13:56", "2015-09-29 06:13:50", "2015-09-29 12:23:14", "2015-09-29 15:58:00",
"2015-09-29 17:58:17", "2015-09-29 22:06:31", "2015-09-29 22:52:28", "2015-09-29 23:55:22")
MEASUREMENT <- c("Length","Length","Width","Length","Width","Height","Length","Height")
df1 <- data.frame(ID,StartDatetime,EndDatetime,MEASUREMENT)
df2
ID <- c("A","B","A","C","C","B","B")
MStart <- c("09/29/2015 00:02:13", "09/29/2015 05:56:50", "09/30/2015 11:55:14", "09/29/2015 15:33:00",
"09/29/2015 17:28:12", "09/29/2015 21:30:31", "09/29/2015 22:26:22")
MEnd <- c("09/29/2015 00:11:12", "09/29/2015 06:55:50", "09/30/2015 11:54:14", "09/29/2015 15:47:00",
"09/29/2015 17:44:12", "09/29/2015 22:02:31", "09/29/2015 22:44:22")
Measurement <- c("Length","Length","Width","Length","Width","Height","Length")
df2 <- data.frame(ID,MStart,MEnd,Measurement)
I am trying to solve an interesting problem of checking if the ID's having the MStart & MEnd in df2 are within the date time ranges in df1 for every measurement. The logic retuns
TRUE if (MStart & MEnd) **is within** (StartDatetime & EndDatetime)
FALSE if (MStart & MEnd) **is not within** (StartDatetime & EndDatetime)
My Desired output will be df3 with all the columns in df1 and also add a column with True or False values.
df3
ID StartDatetime EndDatetime MEASUREMENT True_False
1 A 2015-09-29 00:00:13 2015-09-29 00:13:56 Length TRUE
2 B 2015-09-29 05:55:50 2015-09-29 06:13:50 Length FALSE
3 A 2015-09-29 11:55:14 2015-09-29 12:23:14 Width FALSE
4 C 2015-09-29 15:24:00 2015-09-29 15:58:00 Length TRUE
5 C 2015-09-29 17:24:12 2015-09-29 17:58:17 Width TRUE
6 B 2015-09-29 21:34:31 2015-09-29 22:06:31 Height FALSE
7 B 2015-09-29 22:22:22 2015-09-29 22:52:28 Length TRUE
8 A 2015-09-29 23:38:22 2015-09-29 23:55:22 Height FALSE
I am stuck at this error when trying to convert the date format of df2 and not able to move forward.
**df2$MStart <- as.POSIXct(df2$MStart,"%Y-%m-%d %H:%M:%S")**
Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format
Please guide me on how I could solve this problem. I am trying to use dplyr or data.table for solving this but not knowing the logic for working with date times.
EDIT I just made an edit and removed the last row of df2 and hence it has only 7 rows now. I want to address this case as well since my bigger dataset contains more rows in df1 and fewer rows in df2 and hence would like to return those unmatched rows with FALSE from df1 as well.
Upvotes: 3
Views: 195
Reputation: 4024
library(dplyr)
library(lubridate)
df1_fix =
df1 %>%
mutate(StartDatetime = ymd_hms(StartDatetime),
EndDatetime = ymd_hms(EndDatetime)) %>%
rename(Measurement = MEASUREMENT)
df2_fix =
df2 %>%
mutate(MStart = mdy_hms(MStart),
MEnd = mdy_hms(MEnd))
initial_join =
df1_fix %>%
full_join(df2_fix)
initial_join %>%
filter(StartDatetime < EndDatetime,
MStart < MEnd) %>%
mutate(test =
int_overlaps(
interval(StartDatetime, EndDatetime),
interval(MStart, MEnd) ) ) %>%
right_join(initial_join) %>%
mutate(test = test %in% TRUE)
Upvotes: 3
Reputation: 641
Here's what I would do, which is similar to bramtayl's solution:
library(dplyr)
library(lubridate)
df<-df1
df$StartDatetime <- ymd_hms(StartDatetime)
df$EndDatetime <- ymd_hms(EndDatetime)
df$MStart<- mdy_hms(MStart)
df$MEnd<- mdy_hms(MEnd)
df3 <- df%>%
mutate(True_False = ifelse(MStart >StartDatetime & MStart <EndDatetime &MEnd >StartDatetime & MEnd <EndDatetime,TRUE,FALSE))
Upvotes: 0
Reputation: 992
library(lubridate)
df1$interval <- interval(ymd_hms(df1$StartDatetime),ymd_hms(df1$EndDatetime))
df2$interval <- interval(mdy_hms(df2$MStart),mdy_hms(df1$MEnd))
df1$True_False <- ifelse((df1$interval > df2$interval), "TRUE", "FALSE")
df1
Upvotes: 1