Reputation: 45
I have two data frames (df1 and df2); they each have an ID column, and are organized by ID number with many rows per ID for each dataframe. df1 has a "unique_posix" column, and df2 has a "date.time.start" and "date.time.end" column, as well as a column of "depth" and "shape". I would like, for each ID, to take my "unique_posix" column from df1 and go to df2 and find the "date.time.start" and "date.time.end" that it falls between or on. When I find the row that it corresponds to, I want to pull the "depth" and "shape" from df2 and copy it to new columns in df1 for that unique date/time.
I have tried doing this as a for loop with if/else, and I have tried doing this in dplyr.
df1<-data.frame(ID=c('SW12','SW12','SW12','SW12','SW12','SW13','SW13','SW13','SW13','SW13'), unique_posix=c('5/3/10 16:47','5/3/10 16:53','5/3/10 17:00', '5/3/10 18:00','5/3/10/ 18:12','8/15/10 17:13','8/15/10 17:18','8/15/10 17:37','8/15/10 18:00','8/15/10 18:52'))
df2<- data.frame(ID=c('SW12','SW12','SW12','SW12','SW12','SW13','SW13','SW13','SW13','SW13'), Date.Time.Start=c('5/3/10 15:57','5/3/10 16:18', '5/3/10 16:55','5/3/10 17:36','5/3/10 18:17','8/15/10 16:55','8/15/10 17:28','8/15/10 17:54', '8/15/10 18:55','8/15/10 19:20'), Date.Time.End=c('5/3/10 16:09','5/3/10 16:44','5/3/10 17:28', '5/3/10 18:08', '5/3/10 18:49', '8/15/10 17:22', '8/15/10 17:52','8/15/10 18:06','8/15/10 19:15','8/15/10 19:40'), Shape=c('U','U','V','Square','U','U','U','Square','V','U'), Depth=c(1,2,3,4,5,6,7,8,9,10))
I would like df1 to end up looking like:
df1b<-data.frame(ID=c('SW12','SW12','SW12','SW12','SW12','SW13','SW13','SW13','SW13','SW13'), unique_posix=c('5/3/10 16:47','5/3/10 16:53','5/3/10 17:00', '5/3/10 18:00','5/3/10/ 18:12','8/15/10 17:13','8/15/10 17:18','8/15/10 17:37','8/15/10 18:00','8/15/10 18:52'), Dive.Shape=c(NA,NA,'V','Square',NA,'U','U','U','Square', NA),Dive.Depth=c(NA,NA,3,4,NA,6,6,7,8,NA))
I've converted the date/times to POSIXct/lt:
library(dplyr)
df1 <- df1 %>%
mutate(
ID = factor(ID),
unique_posix = mdy_hm(unique_posix)
)
class(df1$unique_posix)
df2 <- df2 %>%
mutate(
ID = factor(ID),
Date.Time.Start = mdy_hm(Date.Time.Start),
Date.Time.End = mdy_hm(Date.Time.End)
)
class(df2$Date.Time.Start)
As a for-loop I have tried:
df1b<-df1
for (i in 1:nrow(df1)) {
if (df1$unique_posix %within% interval(df2$Date.Time.Start, df2$Date.Time.End)) {
df1b$Dive.Shape<-df2$Shape
df1b$Dive.Depth<-df2$Depth
}
else {
df1b$Dive.Shape<-NA
df2b$Dive.Depth<-NA
}
}
In dplyr I was trying something like this:
df1b<-inner_join(df1, df2, by="DeployID")
df1b %>% rowwise() %>%
mutate(Dive.Shape=ifelse(between(unique_posix, Date.Time.Start, Date.Time.End),Shape,NA )) %>%
mutate(Dive.Depth=ifelse(between(unique_posix, Date.Time.Start, Date.Time.End),Depth,NA ))
arrange(DeployID,desc(unique_posix)) %>%
distinct(unique_posix)
None of this seems to be working, but I feel like I'm close?
I would like to end up with my df1b having two extra columns of Dive.Shape and Dive.Depth, that would contain an "NA" if the unique_posix date/time didn't fall within or on a Date.Time.Start and Date.Time.End range in the df2 frame [for each ID]. The columns would contain values from df2's Shape and df2's Depth column if the df1's unique_posix fell between or on the df2's Date.Time.Start or Date.Time.End columns.
Thank you for any help I can get on this!
Upvotes: 2
Views: 805
Reputation: 30494
If you still want to pursue the dplyr solution, try this:
inner_join(df1, df2, by = "ID") %>%
rowwise() %>%
filter (between(unique_posix, Date.Time.Start, Date.Time.End)) %>%
right_join(df1, by = c("ID", "unique_posix")) %>%
dplyr::select (-c(Date.Time.Start, Date.Time.End), Dive.Shape = Shape, Dive.Depth = Depth)
Upvotes: 1
Reputation: 11255
With data.table
this is relatively simple with a non-equi update join:
library(data.table)
setDT(df1)
setDT(df2)
df1[df2
, on = .(ID
, unique_posix > Date.Time.Start
, unique_posix < Date.Time.End)
, `:=`(Dive.Shape = Shape, Dive.Depth = Depth)]
df1
> df1
ID unique_posix Dive.Shape Dive.Depth
1: SW12 2010-05-03 16:47:00 <NA> NA
2: SW12 2010-05-03 16:53:00 <NA> NA
3: SW12 2010-05-03 17:00:00 V 3
4: SW12 2010-05-03 18:00:00 Square 4
5: SW12 2010-05-03 18:12:00 <NA> NA
6: SW13 2010-08-15 17:13:00 U 6
7: SW13 2010-08-15 17:18:00 U 6
8: SW13 2010-08-15 17:37:00 U 7
9: SW13 2010-08-15 18:00:00 Square 8
10: SW13 2010-08-15 18:52:00 <NA> NA
See also: How to do a data.table rolling join?
Upvotes: 0
Reputation: 729
I think you are. The issue is that in the data.frames, the dates/times are saved as characters.
apply(df1, 2, class)
ID unique_posix
> "character" "character"
apply(df2, 2, class)
ID Date.Time.Start Date.Time.End Shape Depth
"character" "character" "character" "character" "character"
In reality, you want to convert unique_posix
, Date.Time.Start
and Date.Time.End
to dates/times. Possibly use strptime()
? I think that the comparisons would work, but I didn't verify them yet. I need to go soon, but I wanted to give you something anyway.
Upvotes: 0