Reputation: 61
I have 2 data frames as mentioned below:
df1 <- data.frame(ID=c(1,1,1,2,3,4,4), actual.date=c('10/01/1996','10/02/1996','5/01/2002','7/01/1999','9/01/2005','5/01/2006','2/03/2003'),
val=c(5,10,15,20,25,30,35))
dcis <- grep('date$',names(df1))
df1[dcis] <- lapply(df1[dcis],as.Date,'%m/%d/%Y')
df1
df2 <- data.frame(ID=c(1,1,1,2,3,4,4,4), before.date=c('10/1/1996','1/1/1998','1/1/2000','1/1/2001','1/1/2001','1/1/2001','10/1/2004','10/3/2004'), after.date=c('12/1/1996','9/30/2003','12/31/2004','3/31/2006','9/30/2006','9/30/2005','12/30/2004','11/28/2004'))
dcis <- grep('date$',names(df2))
df2[dcis] <- lapply(df2[dcis],as.Date,'%m/%d/%Y')
df2
Requirement -> I will start with each row of df2
and see how many rows in df1
lies within (inclusive range) date range specified in that row of df2
(grouped on ID).
For ex: for 1st row in df2
, there are 2 rows in df1
(1st and 2nd) which has ID=1 and lies in date range of 1st row of df2
. In the final output, I will sum 5+10 (from column 'val') and put against 1st row of df2
.
Upvotes: 1
Views: 57
Reputation: 887951
We can use a non-equi join in data.table
library(data.table)
val1 <- setDT(df1)[df2, sum(val), on = .(ID, actual.date >= before.date,
actual.date <= after.date), by = .EACHI]$V1
df2$val <- val1
df2$val
#[1] 15 15 15 NA 25 35 NA NA
Or another option is a fuzzy_join
library(fuzzyjoin)
library(dplyr)
fuzzy_right_join(df1, df2, by = c("ID",
"actual.date" = "before.date", "actual.date"= "after.date"),
match_fun = list(`==`, `>=`, `<=`)) %>%
group_by(ID = ID.y, before.date, after.date) %>%
summarise(val = sum(val, na.rm = TRUE))
# A tibble: 8 x 4
# Groups: ID, before.date [8]
# ID before.date after.date val
# <dbl> <date> <date> <dbl>
#1 1 1996-10-01 1996-12-01 15
#2 1 1998-01-01 2003-09-30 15
#3 1 2000-01-01 2004-12-31 15
#4 2 2001-01-01 2006-03-31 0
#5 3 2001-01-01 2006-09-30 25
#6 4 2001-01-01 2005-09-30 35
#7 4 2004-10-01 2004-12-30 0
#8 4 2004-10-03 2004-11-28 0
Upvotes: 1
Reputation: 4150
Maybe this a duplicate post
dplyr left_join by less than, greater than condition
df1 %>%
mutate(dummy=TRUE) %>%
left_join(df2 %>% mutate(dummy=TRUE)) %>%
filter(actual.date >= before.date, actual.date < after.date) %>%
select(-dummy)
We also have fuzzyjoin.
library(fuzzyjoin)
fuzzy_inner_join(df1,df2,
by = c("actual.date" = "before.date","actual.date" = "after.date","ID" = "ID"),
match_fun = list(`>=`, `<=`,`==`))
Upvotes: 0
Reputation: 102810
I am not sure if this is what your are after. Here is a base R solution
df2$res <- apply(df2,1, function(x) sum(df1$val[df1$ID == x["ID"]
& df1$actual.date>= x["before.date"]
& df1$actual.date<= x["after.date"]]))
such that
> df2
ID before.date after.date res
1 1 1996-10-01 1996-12-01 15
2 1 1998-01-01 2003-09-30 15
3 1 2000-01-01 2004-12-31 15
4 2 2001-01-01 2006-03-31 0
5 3 2001-01-01 2006-09-30 25
6 4 2001-01-01 2005-09-30 35
7 4 2004-10-01 2004-12-30 0
8 4 2004-10-03 2004-11-28 0
Upvotes: 0