user2117258
user2117258

Reputation: 515

Compute the distances between points of neighboring time points and find the `n` shortest paths though all time points

Exactly as the title states: I'd like to compute the distances between points of neighboring time points and find the n shortest paths though all time points.

I've posted an example below. In this example there are 2 clear areas (in 3D space) where points are localized. Within each area, we have multiple time points. I'd like to compute the distances between T1 --> T2 --> ... --> T8 while enforcing the time point ordering. I ultimately see this as a tree of some sort where we initially branch from the first point of T1 to the 2 (or more) points from T2, then from each T2 to each T3, etc. Once a tree is built then we can compute the distances through each path from start to end and return the top n paths with the smallest distances. In short, the goal here is to connect each T1 node with its respective shortest path. Perhaps there might be a more efficient or better way to do this.

Example data:

> example
                   Timepoint Centre.int.X Centre.int.Y Centre.int.Z
FOV4.Beads.T1.C2          T1        5.102       28.529        0.789
FOV4.Beads.T1.C2.1        T1       37.904       50.845        0.837
FOV4.Beads.T2.C2          T2       37.905       50.843        1.022
FOV4.Beads.T2.C2.1        T2        5.083       28.491        0.972
FOV4.Beads.T4.C2          T4       37.925       50.851        0.858
FOV4.Beads.T4.C2.1        T4        5.074       28.479        0.785
FOV4.Beads.T5.C2          T5       37.908       50.847        0.977
FOV4.Beads.T5.C2.1        T5        5.102       28.475        0.942
FOV4.Beads.T6.C2          T6        5.114       28.515        0.643
FOV4.Beads.T6.C2.1        T6       37.927       50.869        0.653
FOV4.Beads.T7.C2          T7       37.930       50.875        0.614
FOV4.Beads.T7.C2.1        T7        5.132       28.525        0.579
FOV4.Beads.T8.C2          T8        4.933       28.674        0.800
FOV4.Beads.T8.C2.1        T8       37.918       50.816        0.800

This data.frame produces a 3D scatterplot that looks like this: enter image description here

Baseline code to generate the plot above is posted below:

require(scatterplot3d)
    with(example, {
      s3d <- scatterplot3d(Centre.int.X, Centre.int.Y, Centre.int.Z,
                           pch=19,
                           cex.symbols=2,
                           col.axis="grey", col.grid="lightblue",
                           angle=45, 
                           xlab="X",
                           ylab="Y",
                           zlab="Z")
    })

This is a relatively clean example but some of my data is very messy which is why I am trying to avoid clustering methods (e.g. k-means, dbscan, etc). Any help would be appreciated!

EDIT: Adding structure details.

structure(list(Timepoint = structure(c(1L, 1L, 2L, 2L, 4L, 4L, 
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L), .Label = c("T1", "T2", "T3", 
"T4", "T5", "T6", "T7", "T8"), class = "factor"), Centre.int.X = c(5.102, 
37.904, 37.905, 5.083, 37.925, 5.074, 37.908, 5.102, 5.114, 37.927, 
37.93, 5.132, 4.933, 37.918), Centre.int.Y = c(28.529, 50.845, 
50.843, 28.491, 50.851, 28.479, 50.847, 28.475, 28.515, 50.869, 
50.875, 28.525, 28.674, 50.816), Centre.int.Z = c(0.789, 0.837, 
1.022, 0.972, 0.858, 0.785, 0.977, 0.942, 0.643, 0.653, 0.614, 
0.579, 0.8, 0.8)), .Names = c("Timepoint", "Centre.int.X", "Centre.int.Y", 
"Centre.int.Z"), class = "data.frame", row.names = c("FOV4.Beads.T1.C2", 
"FOV4.Beads.T1.C2.1", "FOV4.Beads.T2.C2", "FOV4.Beads.T2.C2.1", 
"FOV4.Beads.T4.C2", "FOV4.Beads.T4.C2.1", "FOV4.Beads.T5.C2", 
"FOV4.Beads.T5.C2.1", "FOV4.Beads.T6.C2", "FOV4.Beads.T6.C2.1", 
"FOV4.Beads.T7.C2", "FOV4.Beads.T7.C2.1", "FOV4.Beads.T8.C2", 
"FOV4.Beads.T8.C2.1"))

Upvotes: 3

Views: 122

Answers (2)

javidcf
javidcf

Reputation: 59711

Here is an implementation in Python:

from io import StringIO
import numpy as np
import pandas as pd

# Read data
s = """Name                      Timepoint Centre.int.X Centre.int.Y Centre.int.Z
FOV4.Beads.T1.C2          T1        5.102       28.529        0.789
FOV4.Beads.T1.C2.1        T1       37.904       50.845        0.837
FOV4.Beads.T2.C2          T2       37.905       50.843        1.022
FOV4.Beads.T2.C2.1        T2        5.083       28.491        0.972
FOV4.Beads.T4.C2          T4       37.925       50.851        0.858
FOV4.Beads.T4.C2.1        T4        5.074       28.479        0.785
FOV4.Beads.T5.C2          T5       37.908       50.847        0.977
FOV4.Beads.T5.C2.1        T5        5.102       28.475        0.942
FOV4.Beads.T6.C2          T6        5.114       28.515        0.643
FOV4.Beads.T6.C2.1        T6       37.927       50.869        0.653
FOV4.Beads.T7.C2          T7       37.930       50.875        0.614
FOV4.Beads.T7.C2.1        T7        5.132       28.525        0.579
FOV4.Beads.T8.C2          T8        4.933       28.674        0.800
FOV4.Beads.T8.C2.1        T8       37.918       50.816        0.800"""

df = pd.read_table(StringIO(s), sep=" ", skipinitialspace=True, index_col=0, header=0)

# Get time point ids
ts = sorted(df.Timepoint.unique())
# Get the spatial points in each time point
points = [df[df.Timepoint == t].iloc[:, -3:].values.copy() for t in ts]
# Get the spatial point names in each time point
point_names = [list(df[df.Timepoint == t].index) for t in ts]

# Find the best next point starting from the end
best_nexts = []
accum_dists = [np.zeros(len(points[-1]))]
for t_prev, t_next in zip(reversed(points[:-1]), reversed(points[1:])):
    t_dists = np.linalg.norm(t_prev[:, np.newaxis, :] - t_next[np.newaxis, :, :], axis=-1)
    t_dists += accum_dists[-1][np.newaxis, :]
    t_best_nexts = np.argmin(t_dists, axis=1)
    t_accum_dists = t_dists[np.arange(len(t_dists)), t_best_nexts]
    best_nexts.append(t_best_nexts)
    accum_dists.append(t_accum_dists)
# Reverse back the best next points and accumulated distances
best_nexts = list(reversed(best_nexts))
accum_dists = list(reversed(accum_dists))

# Reconstruct the paths
paths = []
for i, p in enumerate(point_names[0]):
    cost = accum_dists[0][i]
    path = [p]
    idx = i
    for t_best_nexts, t_point_names in zip(best_nexts, point_names[1:]):
        next_idx = t_best_nexts[idx]
        path.append(t_point_names[next_idx])
        idx = next_idx
    paths.append((path, cost))

for i, (path, cost) in enumerate(paths):
    print("Path {} (total distance {}):".format(i, cost))
    print("\n".join("\t{}".format(p) for p in path))
    print()

Output:

Path 0 (total distance 1.23675871386137):
    FOV4.Beads.T1.C2
    FOV4.Beads.T2.C2.1
    FOV4.Beads.T4.C2.1
    FOV4.Beads.T5.C2.1
    FOV4.Beads.T6.C2
    FOV4.Beads.T7.C2.1
    FOV4.Beads.T8.C2

Path 1 (total distance 1.031072818390815):
    FOV4.Beads.T1.C2.1
    FOV4.Beads.T2.C2
    FOV4.Beads.T4.C2
    FOV4.Beads.T5.C2
    FOV4.Beads.T6.C2.1
    FOV4.Beads.T7.C2
    FOV4.Beads.T8.C2.1

Explanation:

It is basically the same as the Viterbi algorithm. You start at the end, and assign an initial cost to every final node of zero. Then, for every pair of contiguous time points t_prev and t_next you compute the distance between every possible pair of points and add the previously accumulated costs of the points in t_next. Then pick the lowest cost next point for every point in t_prev and continue to the previous time point. In the end, best_nexts contains, for every point in every time point, the best possible point to go to in the next time point.

The reconstruction is just a matter of following these indices in best_nexts. For each possible initial point, pick the best point in the next time point and go on.

Upvotes: 1

csgroen
csgroen

Reputation: 2541

Not very elegant, but it works to find the shortest paths.

distance.matrix <- as.matrix(dist(example[,2:4], upper = TRUE, diag = TRUE))

t1s <- grep("T1", rownames(distance.matrix))
paths <- lapply(t1s, function (t) { 
    path <- rownames(distance.matrix)[t]
    distance <- NULL
    for (i in c(2,4:8))
    {
        next.nodes <- grep(paste0("T", i), rownames(distance.matrix))
        next.t <- names(which.min(distance.matrix[t,next.nodes]))
        path <- c(path, next.t)
        distance <- sum(distance, distance.matrix[t,next.t])
        t <- next.t

    }
    output <- list(path, distance)
    names(output) <- c("Path", "Total Distance")
    return(output)
})

EDIT: Cut off some lines that weren't needed.

Upvotes: 1

Related Questions