Reputation: 3461
Consider this set of time points:
time1 <- as.Date("2011/04/05")
time2 <- as.Date("2011/05/17")
time3 <- as.Date("2011/06/27")
time4 <- as.Date("2011/08/16")
time5 <- as.Date("2011/10/05")
time6 <- as.Date("2011/11/21")
I want to create a vector containing all pairwise durations between the time points. For 6 points in time, this is manageable, but already tedious:
time.elapsed <- as.numeric(c(abs(time1-time2),abs(time1-time3),
abs(time1-time4),abs(time1-time5),
abs(time1-time6),abs(time2-time3),
abs(time2-time4),abs(time2-time5),
abs(time2-time6),abs(time3-time4),
abs(time3-time5),abs(time3-time6),
abs(time4-time5),abs(time4-time6),
abs(time5-time6)))
time.elapsed
[1] 42 83 133 183 230 41 91 141 188 50 100 147 50 97 47
How can i simplify this code for many more timepoints (say, 15 or 20)? It would be nice to keep the format as above (first all durations between timepoint 1 and all others, then time point 2 and all others except 1, etc). Thank you.
Upvotes: 0
Views: 71
Reputation: 4520
There is another solution using sapply
:
# Data from the example
dt <- as.Date(c("2011/04/05",
"2011/05/17",
"2011/06/27",
"2011/08/16",
"2011/10/05",
"2011/11/21")
)
# Difftime with sapply
unlist(sapply(seq_along(dt), function(i) difftime(dt[-(1:i)], dt[i])))
# [1] 42 83 133 183 230 41 91 141 188 50 100 147 50 97 47
The difference between @Wimpel approach and mine is negligible until we are working with small datasets. But, with relatively big datasets you will notice the difference in performance (on my laptop I can see it starting from N > 500):
Off course sapply
doesn't really outperforms outer
so much. Using both approaches you can work only with relatively small datasets. However, it seems that it is better to use sapply
in case when the length of your dataset is greater than several hundreds.
Moreover on my computer, when N = 5500
outer
crashes with Error: cannot allocate vector of size 230.8 Mb
but sapply
still works even when N = 15000
with elapsed time something around 90 seconds. You can check it out on your machine, there is a code.
Upvotes: 1
Reputation: 27732
I think this will help you
df <- data.frame(time = c(time1, time2, time3, time4, time5, time6) )
outer( df$time, df$time , "-" )
Time differences in days
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 -42 -83 -133 -183 -230
[2,] 42 0 -41 -91 -141 -188
[3,] 83 41 0 -50 -100 -147
[4,] 133 91 50 0 -50 -97
[5,] 183 141 100 50 0 -47
[6,] 230 188 147 97 47 0
to get your preferred outcome, get the lower triangle of the matrix
outer( df$time, df$time , "-" )[lower.tri( outer( df$time, df$time , "-" ) )]
Time differences in days
[1] 42 83 133 183 230 41 91 141 188 50 100 147 50 97 47
Upvotes: 1