Reputation: 3461
From a very simple dataframe like
time1 <- as.Date("2010/10/10")
time2 <- as.Date("2010/10/11")
time3 <- as.Date("2010/10/12")
test <- data.frame(Sample=c("A","B", "C"), Date=c(time1, time2, time3))
how can i obtain a matrix with pairwise temporal distances (elapsed time in days between samples) between the Samples A, B, C?
A B C
A 0 1 2
B 1 0 1
C 2 1 0
/edit: changed the format of the dates. sorry for inconveniences
Upvotes: 8
Views: 1835
Reputation: 13807
A really fast solution using a data.table
approach in two steps
# load library
library(reshape)
library(data.table)
# 1. Get all possible combinations of pairs of dates in long format
df <- expand.grid.df(test, test)
colnames(df) <- c("Sample", "Date", "Sample2", "Date2")
# 2. Calculate distances in days, weeks or hours, minutes etc
setDT(df)[, datedist := difftime(Date2, Date, units ="days")]
df
#> Sample Date Sample2 Date2 datedist
#> 1: A 2010-10-10 A 2010-10-10 0 days
#> 2: B 2010-10-11 A 2010-10-10 -1 days
#> 3: C 2010-10-12 A 2010-10-10 -2 days
#> 4: A 2010-10-10 B 2010-10-11 1 days
#> 5: B 2010-10-11 B 2010-10-11 0 days
#> 6: C 2010-10-12 B 2010-10-11 -1 days
#> 7: A 2010-10-10 C 2010-10-12 2 days
#> 8: B 2010-10-11 C 2010-10-12 1 days
#> 9: C 2010-10-12 C 2010-10-12 0 days
Upvotes: 5
Reputation: 73265
Using outer()
You don't need to work with a data frame. In your example, we can collect your dates in a single vector and use outer()
x <- c(time1, time2, time3)
abs(outer(x, x, "-"))
[,1] [,2] [,3]
[1,] 0 1 2
[2,] 1 0 1
[3,] 2 1 0
Note I have added an abs()
outside, so that you will only get positive time difference, i.e, the time difference "today - yesterday" and "yesterday - today" are both 1.
If your data are pre-stored in a data frame, you can extract that column as a vector and then proceed.
Using dist()
As Konrad mentioned, dist()
is often used for computation of distance matrix. The greatest advantage is that it will only compute lower/upper triangular matrix (diagonal are 0), while copying the rest. On the other hand, outer()
forces computing all matrix elements, not knowing the symmetry.
However, dist()
takes numerical vectors, and only computes some classes of distance. See ?dist
Arguments:
x: a numeric matrix, data frame or ‘"dist"’ object.
method: the distance measure to be used. This must be one of
‘"euclidean"’, ‘"maximum"’, ‘"manhattan"’, ‘"canberra"’,
‘"binary"’ or ‘"minkowski"’. Any unambiguous substring can
be given.
But we can actually work around, to use it.
Date object, can be coerced into integers, if you give it an origin. By
x <- as.numeric(x - min(x))
we get number of days since the first day in record. Now we can use dist()
with the default Euclidean
distance:
y <- as.matrix(dist(x, diag = TRUE, upper = TRUE))
rownames(y) <- colnames(y) <- c("A", "B", "C")
A B C
A 0 1 2
B 1 0 1
C 2 1 0
Why putting outer()
as my first example
In principle, time difference is not unsigned. In this case,
outer(x, x, "-")
is more appropriate. I added the abs()
later, because it seems that you intentionally want positive result.
Also, outer()
has far broader use than dist()
. Have a look at my answer here. That OP asks for computing Hamming distance, which is really a kind of bitwise distance.
Upvotes: 6
Reputation: 38500
Here is a method that uses combn
and matrix indexing.
# data
Sample=c("A","B", "C")
Date=as.Date(c("02/10/10", "02/10/11", "02/10/12"), format="%y/%m/%d")
# build a matrix to be filled
myMat <- matrix(0, length(Sample), length(Sample), dimnames=list(Sample, Sample))
# get all pairwise combinations (upper triangle)
samplePairs <- t(combn(Sample, 2))
# add the reverse combination (lower triangle)
samplePairs <- rbind(samplePairs, cbind(samplePairs[,2], samplePairs[,1]))
# calculate differences
diffs <- combn(Date, 2, FUN=diff)
# fill in differences using matrix indexing
myMat[samplePairs] <- diffs
Upvotes: 1
Reputation: 5089
To get actual days calculations, you can convert the days to a date since some pre-defined date and then use dist
. Example below (converted your days, I doubt they were represented how you expected them to be):
time1 <- as.Date("02/10/10","%m/%d/%y")
time2 <- as.Date("02/10/11","%m/%d/%y")
time3 <- as.Date("02/10/12","%m/%d/%y")
test <- data.frame(Sample=c("A","B", "C"), Date=c(time1, time2, time3))
days_s2010 <- difftime(test$Date,as.Date("01/01/10","%m/%d/%y"))
dist_days <- as.matrix(dist(days_s2010,diag=TRUE,upper=TRUE))
rownames(dist_days) <- test$Sample; colnames(dist_days) <- test$Sample
dist_days
then prints out:
> dist_days
A B C
A 0 365 730
B 365 0 365
C 730 365 0
Actually dist
doesn't need to convert the dates to days since some time, simply doing dist(test$Date)
will work for days.
Upvotes: 8