Reputation:
In igraph or networkx, what is the fastest way to find all simple paths of length 4 in a sparse directed graph? One way is to make a graph of a simple path of length 4 and use the subgraph isomorphism vf2 function. Is this the best/fastest way?
I don't have a source node, I would like all simple paths of length 4 that exist in the whole graph.
In my data there are likely to be very few such paths and I would like to be able to iterate over them efficiently.
Upvotes: 5
Views: 1863
Reputation: 104702
If it's only paths of exactly length 4 you need, you can easily find your results by first finding all length two paths, then stitching pairs of them together. In theory you could do this for arbitrary lengths, but it would be messier (you'd probably end up building several power-of-two lists, then combining them to get other lengths).
I'm not very fluent with NetworkX or iGraph, but here's how I'd do it for a graph specified by adjacency lists (where n1[i]
is an iterable containing all the nodes that have an edge going to them from node i
). I think it should be easy to map this to the appropriate API.
n1 = adjacency_lists()
n2 = [[(a, b) for a in x for b in n1[a] if i != b] for i, x in enumerate(n1)]
n4 = [[(a, b, c, d) for (a, b) in x for (c, d) in n2[b]
if i != c and i != d and a != c and a != d]
for (i, x) in enumerate(n2)]
The if
clauses in the list comprehensions ensure that the paths are simple. I've assumed that there are no edges from a node to itself. If that's not true, add to the n2
list comprehension an extra if i != a
clause (before for b in n1[a]
) and add a != b
to the existing if
clause.
You can output the paths with:
for i, paths in enumerate(n4):
for a, b, c, d in paths:
print("{}->{}->{}->{}->{}".format(i, a, b, c, d))
The computation should have asymptotic running time O(N*D^4)
where N
is the number of nodes and D
is the maximum degree of the nodes (which is the best you can do, I think). I doubt there's a substantially faster solution in pure Python. A solution that uses graph library calls that are implemented in C may be faster by a (possibly large) constant factor.
Upvotes: 1
Reputation: 4877
Using a function like this:
def simple_paths(start, length, visited=[]):
if length==0:
yield(visited + [start])
else:
for child in children(start):
if child not in visited:
for path in simple_paths(child, length-1, visited + [start]):
yield(path)
You can list all simple paths of length 4 by calling
for start in nodes():
for path in simple_paths(start, 4):
print path
The above assumes that nodes()
returns an iterable of all nodes in the graph, and that children(x)
returns an iterable of the children of node x
.
Applying the simple_paths()
function to the above graph correctly yields:
['5', '9', '3', '1', '0']
['6', '5', '9', '3', '1']
['6', '5', '3', '1', '0']
['9', '5', '3', '1', '0']
This demonstrates that the function:
['6', '5', '1', '3', '9']
)['6', '5', '3', '1', '5']
)Upvotes: 4
Reputation:
At the beginning, let's solve more simpler problem - calculate number of paths of length 4.
1) Let A be an adjacency matrix of the given graph. A[i][j] = 1
if there exist an edge between vertices I and J and 0 otherwise. A^N
gives number of paths of length N
for some fixed length.
2) matrix squaring looks like
init(RES,0);
for(row=1;N)
for(col=1;N)
for(common=1;N)
RES[row][col] + = a[row][common]*a[common][col];
The physical meaning is such construction is the following:
For each degree deg
of given matrix A
, A[i][j]
stores numbers of paths with length = deg
from i to j
. At the first stage adjacency matrix is just storing number of paths of length=1. When you multiply A^N to A
you are
trying to extend paths of length N to N+1
.
a[row][common]*a[common][col]
can be enterpreted as "there a[row][common]
ways of len=1 from row
to common
and a[common][col]
ways of len=1 from common
to col
. According to combinatorics multiplication principle number of ways with len=1 from row to col is a[row][common]*a[common][col]
".
Now important modification. You want to list all paths not just count them! So, A[i][j] is not integer number but vector
or ArrayList
of integers. Replace RES[row][col] + = a[row][common]*a[common][col]
with RES[row][col].push_back(cartesian_product(a[row][common],a[common][col]))
Complexity of just counting paths is matrix multiplication*degree
= N^3*degree. Applying binary exponentiation you can get N^3*log(degree). In our case degree=4, log(4) = 2, 2~4 - doesn't matter. But now you can't just multiply 2 numbers you should do cartesian product of vectors - paths of len N. So complexity increases to N in common case but in our case to 4)
If you have some questions you are welcome.
Upvotes: 1
Reputation: 4725
You can find the relevant docs here :)
essentially it's just a tree search of some sort
you can use the list of nodes and do this
list_of_nodes = nodes(my_graph)
for mysource in list_of_nodes:
for mytarget in list_of_nodes:
if source != target:
print all_simple_paths(my_graph,source=mysource,target=mytarget, cutoff=4)
If you want all the shortest paths of length 4 you can use ``all_pairs_shortest_path_```
paths = all_pairs_shortest_path(my_graph,cutoff=4)
for k,v in paths:
if len(paths[k][v]) == 4:
print paths[k][v]
There is no real command built in to generate all the simple paths of a certain length as the only way to find that is to generate a tree from every node, e.g. doing a modified version of my for loop above since that uses a modified depth-first search.
"A path with no repeated vertices is called a simple path" - Wikipedia
If you would like a paper citation. The only way to check for simple paths is to check from each node. If you have a graph with 90000 elements; there is no other way to check if two connect :/. The 'target' is really not consequential as it's just another cutoff, though if you have a large number of nodes it can make a difference so you're right there :).
" def _all_simple_paths_graph(G, source, target, cutoff=None):
if cutoff < 1:
return
visited = [source]
stack = [iter(G[source])]
while stack:
children = stack[-1]
child = next(children, None)
if child is None:
stack.pop()
visited.pop()
elif len(visited) < cutoff:
if child == target:
yield visited + [target]
elif child not in visited:
visited.append(child)
stack.append(iter(G[child]))
else: #len(visited) == cutoff:
if child == target or target in children:
yield visited + [target]
stack.pop()
visited.pop()"
code above from networkx docs
to modify it in order to generate the DFS without the 'target' cutoff you can use:
def _all_simple_paths_graph(G, source, target, cutoff=None):
if cutoff < 1:
return
visited = [source]
stack = [iter(G[source])]
while stack:
children = stack[-1]
child = next(children, None)
if child is None:
stack.pop()
visited.pop()
elif len(visited) < cutoff:
#if child == target:
# yield visited + [target]
#elif child not in visited:
if child not in visited:
visited.append(child)
stack.append(iter(G[child]))
#else: #len(visited) == cutoff:
#if child == target or target in children:
# yield visited + [target]
#stack.pop()
#visited.pop()
Hope that works for you :)
Upvotes: 0