Sharath
Sharath

Reputation: 2267

Check if values of datetime column in df2 is within datateime values of df1 in R

I have a 2 dataframes like this:

df1

ID <- c("A","B","A","C","C","B","B","A")
StartDatetime <- c("2015-09-29 00:00:13", "2015-09-29 05:55:50", "2015-09-29 11:45:14", "2015-09-29 15:24:00",
                               "2015-09-29 17:24:12", "2015-09-29 21:34:31", "2015-09-29 22:22:22", "2015-09-29 23:38:22") 
EndDatetime <- c("2015-09-29 00:13:56", "2015-09-29 06:13:50", "2015-09-29 12:23:14", "2015-09-29 15:58:00",
                             "2015-09-29 17:58:17", "2015-09-29 22:06:31", "2015-09-29 22:52:28", "2015-09-29 23:55:22") 
MEASUREMENT <- c("Length","Length","Width","Length","Width","Height","Length","Height") 
df1 <- data.frame(ID,StartDatetime,EndDatetime,MEASUREMENT)

df2

ID <- c("A","B","A","C","C","B","B")
MStart <- c("09/29/2015 00:02:13", "09/29/2015 05:56:50", "09/30/2015 11:55:14", "09/29/2015 15:33:00",
                    "09/29/2015 17:28:12", "09/29/2015 21:30:31", "09/29/2015 22:26:22") 
MEnd <- c("09/29/2015 00:11:12", "09/29/2015 06:55:50", "09/30/2015 11:54:14", "09/29/2015 15:47:00",
                  "09/29/2015 17:44:12", "09/29/2015 22:02:31", "09/29/2015 22:44:22") 
Measurement <- c("Length","Length","Width","Length","Width","Height","Length") 
df2 <- data.frame(ID,MStart,MEnd,Measurement)

I am trying to solve an interesting problem of checking if the ID's having the MStart & MEnd in df2 are within the date time ranges in df1 for every measurement. The logic retuns

TRUE if (MStart & MEnd) **is within** (StartDatetime & EndDatetime)
FALSE if (MStart & MEnd) **is not within** (StartDatetime & EndDatetime)

My Desired output will be df3 with all the columns in df1 and also add a column with True or False values.

df3

  ID       StartDatetime         EndDatetime MEASUREMENT True_False
1  A 2015-09-29 00:00:13 2015-09-29 00:13:56      Length       TRUE
2  B 2015-09-29 05:55:50 2015-09-29 06:13:50      Length      FALSE
3  A 2015-09-29 11:55:14 2015-09-29 12:23:14       Width      FALSE
4  C 2015-09-29 15:24:00 2015-09-29 15:58:00      Length       TRUE
5  C 2015-09-29 17:24:12 2015-09-29 17:58:17       Width       TRUE
6  B 2015-09-29 21:34:31 2015-09-29 22:06:31      Height      FALSE
7  B 2015-09-29 22:22:22 2015-09-29 22:52:28      Length       TRUE
8  A 2015-09-29 23:38:22 2015-09-29 23:55:22      Height      FALSE

I am stuck at this error when trying to convert the date format of df2 and not able to move forward.

**df2$MStart <- as.POSIXct(df2$MStart,"%Y-%m-%d %H:%M:%S")**

Error in as.POSIXlt.character(as.character(x), ...) : 
  character string is not in a standard unambiguous format

Please guide me on how I could solve this problem. I am trying to use dplyr or data.table for solving this but not knowing the logic for working with date times.

EDIT I just made an edit and removed the last row of df2 and hence it has only 7 rows now. I want to address this case as well since my bigger dataset contains more rows in df1 and fewer rows in df2 and hence would like to return those unmatched rows with FALSE from df1 as well.

Upvotes: 3

Views: 195

Answers (3)

bramtayl
bramtayl

Reputation: 4024

library(dplyr)
library(lubridate)

df1_fix = 
  df1 %>%
  mutate(StartDatetime = ymd_hms(StartDatetime),
         EndDatetime = ymd_hms(EndDatetime)) %>%
  rename(Measurement = MEASUREMENT)


df2_fix = 
  df2 %>%
  mutate(MStart  = mdy_hms(MStart),
         MEnd = mdy_hms(MEnd))

initial_join = 
  df1_fix %>%
  full_join(df2_fix)

initial_join %>%
  filter(StartDatetime < EndDatetime,
         MStart < MEnd) %>%
  mutate(test = 
           int_overlaps(
             interval(StartDatetime, EndDatetime),
               interval(MStart, MEnd) ) ) %>%
  right_join(initial_join) %>%
  mutate(test = test %in% TRUE)

Upvotes: 3

Pash101
Pash101

Reputation: 641

Here's what I would do, which is similar to bramtayl's solution:

library(dplyr)
library(lubridate)
df<-df1
df$StartDatetime <- ymd_hms(StartDatetime)
df$EndDatetime <- ymd_hms(EndDatetime)
df$MStart<- mdy_hms(MStart)
df$MEnd<- mdy_hms(MEnd)

df3 <- df%>%
  mutate(True_False = ifelse(MStart >StartDatetime & MStart <EndDatetime &MEnd >StartDatetime & MEnd <EndDatetime,TRUE,FALSE))

Upvotes: 0

Navin Manaswi
Navin Manaswi

Reputation: 992

library(lubridate)
df1$interval <- interval(ymd_hms(df1$StartDatetime),ymd_hms(df1$EndDatetime))
df2$interval <- interval(mdy_hms(df2$MStart),mdy_hms(df1$MEnd))
df1$True_False <- ifelse((df1$interval > df2$interval), "TRUE", "FALSE")
df1 

Upvotes: 1

Related Questions