L.Wild
L.Wild

Reputation: 45

Check if date is between two dates in another data frame, and manipulate date if it is

I have two data frames (df1 and df2); they each have an ID column, and are organized by ID number with many rows per ID for each dataframe. df1 has a "unique_posix" column, and df2 has a "date.time.start" and "date.time.end" column, as well as a column of "depth" and "shape". I would like, for each ID, to take my "unique_posix" column from df1 and go to df2 and find the "date.time.start" and "date.time.end" that it falls between or on. When I find the row that it corresponds to, I want to pull the "depth" and "shape" from df2 and copy it to new columns in df1 for that unique date/time.

I have tried doing this as a for loop with if/else, and I have tried doing this in dplyr.

df1<-data.frame(ID=c('SW12','SW12','SW12','SW12','SW12','SW13','SW13','SW13','SW13','SW13'), unique_posix=c('5/3/10 16:47','5/3/10 16:53','5/3/10 17:00', '5/3/10 18:00','5/3/10/ 18:12','8/15/10 17:13','8/15/10 17:18','8/15/10 17:37','8/15/10 18:00','8/15/10 18:52'))

df2<- data.frame(ID=c('SW12','SW12','SW12','SW12','SW12','SW13','SW13','SW13','SW13','SW13'), Date.Time.Start=c('5/3/10 15:57','5/3/10 16:18', '5/3/10 16:55','5/3/10 17:36','5/3/10 18:17','8/15/10 16:55','8/15/10 17:28','8/15/10 17:54', '8/15/10 18:55','8/15/10 19:20'), Date.Time.End=c('5/3/10 16:09','5/3/10 16:44','5/3/10 17:28', '5/3/10 18:08', '5/3/10 18:49', '8/15/10 17:22', '8/15/10 17:52','8/15/10 18:06','8/15/10 19:15','8/15/10 19:40'), Shape=c('U','U','V','Square','U','U','U','Square','V','U'), Depth=c(1,2,3,4,5,6,7,8,9,10))

I would like df1 to end up looking like:

df1b<-data.frame(ID=c('SW12','SW12','SW12','SW12','SW12','SW13','SW13','SW13','SW13','SW13'), unique_posix=c('5/3/10 16:47','5/3/10 16:53','5/3/10 17:00', '5/3/10 18:00','5/3/10/ 18:12','8/15/10 17:13','8/15/10 17:18','8/15/10 17:37','8/15/10 18:00','8/15/10 18:52'), Dive.Shape=c(NA,NA,'V','Square',NA,'U','U','U','Square', NA),Dive.Depth=c(NA,NA,3,4,NA,6,6,7,8,NA))

I've converted the date/times to POSIXct/lt:

library(dplyr)
df1 <- df1 %>% 
  mutate(
    ID = factor(ID),
    unique_posix = mdy_hm(unique_posix)
  )
class(df1$unique_posix)

df2 <- df2 %>% 
  mutate(
    ID = factor(ID),
    Date.Time.Start = mdy_hm(Date.Time.Start),
    Date.Time.End = mdy_hm(Date.Time.End)
  )
class(df2$Date.Time.Start)

As a for-loop I have tried:

df1b<-df1
for (i in 1:nrow(df1)) {
  if (df1$unique_posix %within% interval(df2$Date.Time.Start, df2$Date.Time.End)) {
    df1b$Dive.Shape<-df2$Shape
    df1b$Dive.Depth<-df2$Depth
  }
  else {
    df1b$Dive.Shape<-NA
    df2b$Dive.Depth<-NA
  }
}

In dplyr I was trying something like this:

df1b<-inner_join(df1, df2, by="DeployID")
df1b %>% rowwise() %>%
  mutate(Dive.Shape=ifelse(between(unique_posix, Date.Time.Start, Date.Time.End),Shape,NA )) %>%
mutate(Dive.Depth=ifelse(between(unique_posix, Date.Time.Start, Date.Time.End),Depth,NA ))
  arrange(DeployID,desc(unique_posix)) %>%
  distinct(unique_posix)

None of this seems to be working, but I feel like I'm close?

I would like to end up with my df1b having two extra columns of Dive.Shape and Dive.Depth, that would contain an "NA" if the unique_posix date/time didn't fall within or on a Date.Time.Start and Date.Time.End range in the df2 frame [for each ID]. The columns would contain values from df2's Shape and df2's Depth column if the df1's unique_posix fell between or on the df2's Date.Time.Start or Date.Time.End columns.

Thank you for any help I can get on this!

Upvotes: 2

Views: 805

Answers (3)

Ben
Ben

Reputation: 30494

If you still want to pursue the dplyr solution, try this:

inner_join(df1, df2, by = "ID") %>%
  rowwise() %>%
  filter (between(unique_posix, Date.Time.Start, Date.Time.End)) %>%
  right_join(df1, by = c("ID", "unique_posix")) %>%
  dplyr::select (-c(Date.Time.Start, Date.Time.End), Dive.Shape = Shape, Dive.Depth = Depth)

Upvotes: 1

Cole
Cole

Reputation: 11255

With data.table this is relatively simple with a non-equi update join:

library(data.table)
setDT(df1)
setDT(df2)

df1[df2
    , on = .(ID
             , unique_posix > Date.Time.Start
                , unique_posix < Date.Time.End)
    , `:=`(Dive.Shape = Shape, Dive.Depth = Depth)]

df1

> df1
      ID        unique_posix Dive.Shape Dive.Depth
 1: SW12 2010-05-03 16:47:00       <NA>         NA
 2: SW12 2010-05-03 16:53:00       <NA>         NA
 3: SW12 2010-05-03 17:00:00          V          3
 4: SW12 2010-05-03 18:00:00     Square          4
 5: SW12 2010-05-03 18:12:00       <NA>         NA
 6: SW13 2010-08-15 17:13:00          U          6
 7: SW13 2010-08-15 17:18:00          U          6
 8: SW13 2010-08-15 17:37:00          U          7
 9: SW13 2010-08-15 18:00:00     Square          8
10: SW13 2010-08-15 18:52:00       <NA>         NA

See also: How to do a data.table rolling join?

Upvotes: 0

phargart
phargart

Reputation: 729

I think you are. The issue is that in the data.frames, the dates/times are saved as characters.

apply(df1, 2, class) 
          ID unique_posix 
>  "character"  "character" 

apply(df2, 2, class)
             ID Date.Time.Start   Date.Time.End           Shape           Depth 
    "character"     "character"     "character"     "character"     "character" 

In reality, you want to convert unique_posix, Date.Time.Start and Date.Time.End to dates/times. Possibly use strptime()? I think that the comparisons would work, but I didn't verify them yet. I need to go soon, but I wanted to give you something anyway.

Upvotes: 0

Related Questions