nouse
nouse

Reputation: 3461

Create a vector of pairwise durations between many timepoints

Consider this set of time points:

time1 <- as.Date("2011/04/05")
time2 <- as.Date("2011/05/17")
time3 <- as.Date("2011/06/27")
time4 <- as.Date("2011/08/16")
time5 <- as.Date("2011/10/05")
time6 <- as.Date("2011/11/21")

I want to create a vector containing all pairwise durations between the time points. For 6 points in time, this is manageable, but already tedious:

time.elapsed <- as.numeric(c(abs(time1-time2),abs(time1-time3),
                             abs(time1-time4),abs(time1-time5),
                             abs(time1-time6),abs(time2-time3),
                             abs(time2-time4),abs(time2-time5),
                             abs(time2-time6),abs(time3-time4),
                             abs(time3-time5),abs(time3-time6),
                             abs(time4-time5),abs(time4-time6),
                             abs(time5-time6)))

time.elapsed
 [1]  42  83 133 183 230  41  91 141 188  50 100 147  50  97  47

How can i simplify this code for many more timepoints (say, 15 or 20)? It would be nice to keep the format as above (first all durations between timepoint 1 and all others, then time point 2 and all others except 1, etc). Thank you.

Upvotes: 0

Views: 71

Answers (2)

utubun
utubun

Reputation: 4520

There is another solution using sapply:

# Data from the example
dt <- as.Date(c("2011/04/05", 
                "2011/05/17", 
                "2011/06/27", 
                "2011/08/16", 
                "2011/10/05", 
                "2011/11/21")
              )
# Difftime with sapply
unlist(sapply(seq_along(dt), function(i) difftime(dt[-(1:i)], dt[i])))
# [1]  42  83 133 183 230  41  91 141 188  50 100 147  50  97  47

The difference between @Wimpel approach and mine is negligible until we are working with small datasets. But, with relatively big datasets you will notice the difference in performance (on my laptop I can see it starting from N > 500):

sapply vs outer - performance

Off course sapply doesn't really outperforms outer so much. Using both approaches you can work only with relatively small datasets. However, it seems that it is better to use sapply in case when the length of your dataset is greater than several hundreds.

Moreover on my computer, when N = 5500 outer crashes with Error: cannot allocate vector of size 230.8 Mb but sapply still works even when N = 15000 with elapsed time something around 90 seconds. You can check it out on your machine, there is a code.

Upvotes: 1

Wimpel
Wimpel

Reputation: 27732

I think this will help you

df <- data.frame(time = c(time1, time2, time3, time4, time5, time6) )

outer( df$time, df$time , "-" )

Time differences in days
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0  -42  -83 -133 -183 -230
[2,]   42    0  -41  -91 -141 -188
[3,]   83   41    0  -50 -100 -147
[4,]  133   91   50    0  -50  -97
[5,]  183  141  100   50    0  -47
[6,]  230  188  147   97   47    0

to get your preferred outcome, get the lower triangle of the matrix

outer( df$time, df$time , "-" )[lower.tri( outer( df$time, df$time , "-" ) )]

Time differences in days
 [1]  42  83 133 183 230  41  91 141 188  50 100 147  50  97  47

Upvotes: 1

Related Questions