Reputation: 31
I'm wondering if there is a function within igraph to calculate connection probabilities among vertices in a weighted graph, where the weights for the edges are probabilities of connection of the adjacent vertices.
I've built a graph based on such an adjacency matrix where adjacent connection probabilities form the weights (this is for a river network so each node of the graph is only connected to a single downstream node).
I had hoped to use something like the shortest.paths
function in igraph but that sums the weights rather than calculates the product of them and I can't work out a way to change that.
The example below shows how I construct the adjacency matrix from the data I have, which is the probability that the vertex is connected to its downstream vertex (ProbConn) and then the identity of the downstream vertex (downstream). The most downstream vertex is the river mouth so it is connected to no other (hence the vector called downstream begins with NA).
library(igraph)
# vector of probability of connectivity to downstream vertex
ProbConn <- c(0, 1, 0.945881098491627, 0.997349787519144, 0.891475447373691,
0.993221681072185, 0.48071450525165, 0.0292543433507856, 0.0248645581575872,
1, 0.00540807765075205, 0.661465657844344, 0.108524549747512,
0.383311676351655, 0.708853495942148, 0.00150109592270933, 0.463859846404347,
0.0011491165581467, 2.87879700370202e-09, 0.536140153595653,
0.00831752330277812, 0.00185182893416988, 0.0186237313262708,
0.398961560996748, 0.582414707676981, 0.338534342155656, 1, 0.00137024127706289,
0.291146504057852, 1, 0.0743301054564134, 0.0514743607033332,
1, 1)
# the downstream vertex of each node
downstream <- c(NA, 1, 2, 3, 4, 5, 6, 2, 2, 7, 5, 8, 4, 6, 10, 3, 11, 3, 4,
11, 6, 6, 9, 9, 9, 8, 12, 5, 10, 13, 6, 6, 14, 15)
# Create the adjacency matrix from these vectors
adjacPI <- matrix(0, nrow=length(downstream), ncol=length(downstream)) # Set up the adjacency matrix to build the distance matrix
for (i in 1:length(downstream)) {
adjacPI[i, downstream[i]] <- ProbConn[i] # Fill the adjacency matrix
}
# create the graph reflecting the downstream connectivity
PIgraph <- graph.adjacency(adjacPI, weighted=T)
plot(PIgraph) # visualise the graph
PIpath <- shortest.paths(PIgraph, mode="out")
# creates the shortest paths matrix based on summing the distances of each step along each path
To extract an example from the shortest paths matrix PIpath, vertices 10 and 34 are connected via vertex 15. As calculated in PIpath, the path distance between vertices 10 and 34 (PIpath[34,10]) is 1.708 which is the sum of the probability of connection between vertices 34 and 15 (PIpath[34,15] = 1), and vertices 15 and 10 (PIpath[15, 10] = 0.708) I would like for that to be a product so the path "distance" between 10 and 34 is 1*0.708.
I'm not entirely sure of the nomenclature but the elements of the matrix I'm looking for would be the product of the transition probabilities of each step between the connected vertices. Essentially replacing the sum function in shortest.paths with a product.
Is it possible to calculate that with a function in igraph or do I need to write some code separately to do this?
Upvotes: 3
Views: 1316
Reputation: 44340
If a path's links have probabilities p_1, p_2, ..., p_n
of succeeding, then (assuming independence of the link success probabilities, which I will do throughout this answer) the probability that the entire path succeeds is p_1 * p_2 * ... * p_n
. As you note this is a product, but shortest path minimizes sums; a common trick to convert products to sums is taking the logarithm. The log of the probability that the path succeeds is log(p_1) + log(p_2) + ... + log(p_n)
. Maximizing that (our goal) is equivalent to minimizing (-log(p_1)) + (-log(p_2)) + ... (-log(p_n))
. Since all the probabilities lie between 0 and 1, their logs are non-positive and therefore the negative of their logs are non-negative.
In conclusion, you can set all your weights to -log(p_i)
, where p_i
is the probability that the connection will succeed, and the shortest path between a pair of nodes (as calculated by the shortest.paths
function in igraph
) will be the path that maximizes the probability of connection. You could build your graph as a one-liner given your vectors ProbConn
and downstream
by switching to graph.data.frame
:
PIgraph <- graph.data.frame(na.omit(cbind(from=seq_along(downstream), to=downstream,
weight=-log(ProbConn))),
vertices=seq_along(downstream))
Upvotes: 5