nouse
nouse

Reputation: 3461

Temporal distance matrix from dates

From a very simple dataframe like

    time1 <- as.Date("2010/10/10")
    time2 <- as.Date("2010/10/11")
    time3 <- as.Date("2010/10/12")
    test <- data.frame(Sample=c("A","B", "C"), Date=c(time1, time2, time3))

how can i obtain a matrix with pairwise temporal distances (elapsed time in days between samples) between the Samples A, B, C?

   A  B  C
A  0  1  2
B  1  0  1
C  2  1  0

/edit: changed the format of the dates. sorry for inconveniences

Upvotes: 8

Views: 1835

Answers (4)

rafa.pereira
rafa.pereira

Reputation: 13807

A really fast solution using a data.table approach in two steps

# load library
 library(reshape)
 library(data.table)

# 1. Get all possible combinations of pairs of dates in long format
df <- expand.grid.df(test, test)
colnames(df) <- c("Sample", "Date", "Sample2", "Date2")

# 2. Calculate distances in days, weeks or hours, minutes etc
setDT(df)[, datedist := difftime(Date2, Date, units ="days")]

df
#>    Sample       Date Sample2      Date2 datedist
#> 1:      A 2010-10-10       A 2010-10-10   0 days
#> 2:      B 2010-10-11       A 2010-10-10  -1 days
#> 3:      C 2010-10-12       A 2010-10-10  -2 days
#> 4:      A 2010-10-10       B 2010-10-11   1 days
#> 5:      B 2010-10-11       B 2010-10-11   0 days
#> 6:      C 2010-10-12       B 2010-10-11  -1 days
#> 7:      A 2010-10-10       C 2010-10-12   2 days
#> 8:      B 2010-10-11       C 2010-10-12   1 days
#> 9:      C 2010-10-12       C 2010-10-12   0 days

Upvotes: 5

Zheyuan Li
Zheyuan Li

Reputation: 73265

Using outer()

You don't need to work with a data frame. In your example, we can collect your dates in a single vector and use outer()

x <- c(time1, time2, time3)
abs(outer(x, x, "-"))

     [,1] [,2] [,3]
[1,]    0    1    2
[2,]    1    0    1
[3,]    2    1    0

Note I have added an abs() outside, so that you will only get positive time difference, i.e, the time difference "today - yesterday" and "yesterday - today" are both 1.

If your data are pre-stored in a data frame, you can extract that column as a vector and then proceed.

Using dist()

As Konrad mentioned, dist() is often used for computation of distance matrix. The greatest advantage is that it will only compute lower/upper triangular matrix (diagonal are 0), while copying the rest. On the other hand, outer() forces computing all matrix elements, not knowing the symmetry.

However, dist() takes numerical vectors, and only computes some classes of distance. See ?dist

Arguments:

       x: a numeric matrix, data frame or ‘"dist"’ object.

  method: the distance measure to be used. This must be one of
          ‘"euclidean"’, ‘"maximum"’, ‘"manhattan"’, ‘"canberra"’,
          ‘"binary"’ or ‘"minkowski"’.  Any unambiguous substring can
          be given.

But we can actually work around, to use it.

Date object, can be coerced into integers, if you give it an origin. By

x <- as.numeric(x - min(x))

we get number of days since the first day in record. Now we can use dist() with the default Euclidean distance:

y <- as.matrix(dist(x, diag = TRUE, upper = TRUE))
rownames(y) <- colnames(y) <- c("A", "B", "C")

  A B C
A 0 1 2
B 1 0 1
C 2 1 0

Why putting outer() as my first example

In principle, time difference is not unsigned. In this case,

outer(x, x, "-")

is more appropriate. I added the abs() later, because it seems that you intentionally want positive result.

Also, outer() has far broader use than dist(). Have a look at my answer here. That OP asks for computing Hamming distance, which is really a kind of bitwise distance.

Upvotes: 6

lmo
lmo

Reputation: 38500

Here is a method that uses combn and matrix indexing.

# data
Sample=c("A","B", "C")
Date=as.Date(c("02/10/10", "02/10/11", "02/10/12"), format="%y/%m/%d")
# build a matrix to be filled
myMat <- matrix(0, length(Sample), length(Sample), dimnames=list(Sample, Sample))

# get all pairwise combinations (upper triangle)
samplePairs <- t(combn(Sample, 2))
# add the reverse combination (lower triangle)
samplePairs <- rbind(samplePairs, cbind(samplePairs[,2], samplePairs[,1]))
# calculate differences
diffs <- combn(Date, 2, FUN=diff)

# fill in differences using matrix indexing
myMat[samplePairs] <- diffs

Upvotes: 1

Andy W
Andy W

Reputation: 5089

To get actual days calculations, you can convert the days to a date since some pre-defined date and then use dist. Example below (converted your days, I doubt they were represented how you expected them to be):

time1 <- as.Date("02/10/10","%m/%d/%y")
time2 <- as.Date("02/10/11","%m/%d/%y")
time3 <- as.Date("02/10/12","%m/%d/%y")
test <- data.frame(Sample=c("A","B", "C"), Date=c(time1, time2, time3))
days_s2010 <- difftime(test$Date,as.Date("01/01/10","%m/%d/%y"))
dist_days <- as.matrix(dist(days_s2010,diag=TRUE,upper=TRUE))
rownames(dist_days) <- test$Sample; colnames(dist_days) <- test$Sample

dist_days then prints out:

> dist_days
    A   B   C
A   0 365 730
B 365   0 365
C 730 365   0

Actually dist doesn't need to convert the dates to days since some time, simply doing dist(test$Date) will work for days.

Upvotes: 8

Related Questions