Reputation: 91
I have a large matrix similar to the next example that I create (I have 70 columns and millions of rows):
a <- seq(as.IDate("2011-12-30"), as.IDate("2014-01-04"), by="days")
data <- data.table(ID = 1:length(a), date1 = a)
I want to extract all those lines that are in IDs, it contains the ID of the individual, and the dates that I need to extract from that individual. An individual can have multiple lines.
a <- seq(as.IDate("2011-12-30"), as.IDate("2014-01-04"), by="week")
b <- seq(as.IDate("2012-01-01"), as.IDate("2014-01-06"), by="week")
IDs <- data.table(ID = 1:length(a), date1 = a, date2 = b)
Currently, my solution is not very fast, what would be better?
A <- list()
for(i in 1:dim(IDs)[1]){
A[[i]] <- data[ID == IDs[i,ID] & (date1 %between% IDs[i,.(date1,date2)]),]
}
Upvotes: 3
Views: 141
Reputation: 25225
I think you are looking for a non-equi inner join:
IDs[data, on=.(ID, date1<=date1, date2>=date1), nomatch=0L, .(ID, date1=i.date1)]
Or associatively,
data[IDs, on=.(ID, date1>=date1, date1<=date2), nomatch=0L, .(ID, date1=x.date1)]
Or viewing it as a non-equi semi-join:
data[IDs[data, on=.(ID, date1<=date1, date2>=date1), nomatch=0L, which=TRUE]]
output:
ID date1
1: 1 2011-12-30
Upvotes: 3