Andy
Andy

Reputation: 1

Identifying the sum of all possible pairs of observed dates within a specified range (in R)

I have a dataset that has a date object (one date per individual).
I'd like to get the sum of all possible pairs that meet pre-specified criteria. For example, I want the total number of pairs that could be "matched" based on the fact that their dates were within, say, 15 days of each other:

df <- data.frame("ID"=c(1:5), "Date"=c("2005-01-05","2005-01-08","2005-01-21","2005-01-22","2005-02-04"))

df

ID1 matches with ID2, ID2 matches with ID3, ID2 matches with ID4 ID3 matches with ID4, ID3 matches with ID5 ID4 matches with ID5

All I want for output is the sum of all possible pairs (in this case, n=6 possible pairs).

I've played with ddply and aggregate functions quite a bit for this problem, but really can't nail down where I'm going wrong. I suspect it has something to do with the fact that I have date objects. I'll spare you all of my elementary attempted, unsuccessful solutions.

And no, this is not homework. Somewhat new to R and this is part of a larger cluster analysis project I am working on.

result

Upvotes: 0

Views: 70

Answers (3)

Chris Holbrook
Chris Holbrook

Reputation: 2636

df <- data.frame("ID"=c(1:5), "Date"=c("2005-01-05","2005-01-08","2005-01-   
    21","2005-01-22","2005-02-04"))
df$Date <- as.Date(df$Date) #coerce to Date

#create a square matrix; each col contains same dates
m1 <- matrix(df$Date,nrow=nrow(df),ncol=nrow(df)) 
m2 <- t(m1) #transpose
m1
m2
#subtract the two matrices; and subset lower triangle of result
diff <- (m1-m2)[lower.tri(m1)] 
sum(abs(diff) <= 15) #count those that meet criteria

Upvotes: 1

HubertL
HubertL

Reputation: 19544

df <- data.frame("ID"=c(1:5), "Date"=c("2005-01-05","2005-01-08","2005-01-21","2005-01-22","2005-02-04"))
df$Date <- as.POSIXct(df$Date)
df$MaxDate <- df$Date+15*24*3600
library(sqldf)
result = sqldf("select a.ID, b.ID from df a inner join df b
               on a.Date between b.Date and b.MaxDate and a.ID<>b.ID")

Upvotes: 0

Jordan
Jordan

Reputation: 614

df <- data.frame("ID"=c(1:5), "Date"=c("2005-01-05","2005-01-08","2005-01-21","2005-01-22","2005-02-04"))

# Convert to a date field
df$Date <- as.Date(df$Date)
library(dplyr)

#  Create an index that will be used to prevent recomparing pairs that have already been counted
df$myid <- seq(1:length(df$Date))

group_by(df,Date) %>% summarise(my.x=length(Date[abs(difftime(Date,df$Date[myid:nrow(df)],units="days"))<=15])-1)

Upvotes: 0

Related Questions