user2171391
user2171391

Reputation:

Sub graph isomorphism

In igraph or networkx, what is the fastest way to find all simple paths of length 4 in a sparse directed graph? One way is to make a graph of a simple path of length 4 and use the subgraph isomorphism vf2 function. Is this the best/fastest way?

I don't have a source node, I would like all simple paths of length 4 that exist in the whole graph.

In my data there are likely to be very few such paths and I would like to be able to iterate over them efficiently.

Upvotes: 5

Views: 1863

Answers (4)

Blckknght
Blckknght

Reputation: 104702

If it's only paths of exactly length 4 you need, you can easily find your results by first finding all length two paths, then stitching pairs of them together. In theory you could do this for arbitrary lengths, but it would be messier (you'd probably end up building several power-of-two lists, then combining them to get other lengths).

I'm not very fluent with NetworkX or iGraph, but here's how I'd do it for a graph specified by adjacency lists (where n1[i] is an iterable containing all the nodes that have an edge going to them from node i). I think it should be easy to map this to the appropriate API.

n1 = adjacency_lists()
n2 = [[(a, b) for a in x for b in n1[a] if i != b] for i, x in enumerate(n1)]
n4 = [[(a, b, c, d) for (a, b) in x for (c, d) in n2[b]
       if i != c and i != d and a != c and a != d]
      for (i, x) in enumerate(n2)]

The if clauses in the list comprehensions ensure that the paths are simple. I've assumed that there are no edges from a node to itself. If that's not true, add to the n2 list comprehension an extra if i != a clause (before for b in n1[a]) and add a != b to the existing if clause.

You can output the paths with:

for i, paths in enumerate(n4):
    for a, b, c, d in paths:
        print("{}->{}->{}->{}->{}".format(i, a, b, c, d))

The computation should have asymptotic running time O(N*D^4) where N is the number of nodes and D is the maximum degree of the nodes (which is the best you can do, I think). I doubt there's a substantially faster solution in pure Python. A solution that uses graph library calls that are implemented in C may be faster by a (possibly large) constant factor.

Upvotes: 1

mitchus
mitchus

Reputation: 4877

Using a function like this:

def simple_paths(start, length, visited=[]):
    if length==0:
        yield(visited + [start])
    else:
        for child in children(start):
            if child not in visited:
                for path in simple_paths(child, length-1, visited + [start]):
                    yield(path)

You can list all simple paths of length 4 by calling

for start in nodes():
    for path in simple_paths(start, 4):
        print path

The above assumes that nodes() returns an iterable of all nodes in the graph, and that children(x) returns an iterable of the children of node x.

enter image description here

Applying the simple_paths() function to the above graph correctly yields:

['5', '9', '3', '1', '0']
['6', '5', '9', '3', '1']
['6', '5', '3', '1', '0']
['9', '5', '3', '1', '0']

This demonstrates that the function:

  • respects directed edges (e.g. it does not choose ['6', '5', '1', '3', '9'])
  • only chooses simple paths (e.g. it does not choose ['6', '5', '3', '1', '5'])

Upvotes: 4

user2104560
user2104560

Reputation:

At the beginning, let's solve more simpler problem - calculate number of paths of length 4.

1) Let A be an adjacency matrix of the given graph. A[i][j] = 1 if there exist an edge between vertices I and J and 0 otherwise. A^N gives number of paths of length N for some fixed length.

2) matrix squaring looks like

init(RES,0);
for(row=1;N)
      for(col=1;N)
         for(common=1;N)
             RES[row][col] + = a[row][common]*a[common][col];

The physical meaning is such construction is the following:
For each degree deg of given matrix A, A[i][j] stores numbers of paths with length = deg from i to j. At the first stage adjacency matrix is just storing number of paths of length=1. When you multiply A^N to A you are trying to extend paths of length N to N+1.

a[row][common]*a[common][col] can be enterpreted as "there a[row][common] ways of len=1 from row to common and a[common][col] ways of len=1 from common to col. According to combinatorics multiplication principle number of ways with len=1 from row to col is a[row][common]*a[common][col]".

Now important modification. You want to list all paths not just count them! So, A[i][j] is not integer number but vector or ArrayList of integers. Replace RES[row][col] + = a[row][common]*a[common][col] with RES[row][col].push_back(cartesian_product(a[row][common],a[common][col])) Complexity of just counting paths is matrix multiplication*degree = N^3*degree. Applying binary exponentiation you can get N^3*log(degree). In our case degree=4, log(4) = 2, 2~4 - doesn't matter. But now you can't just multiply 2 numbers you should do cartesian product of vectors - paths of len N. So complexity increases to N in common case but in our case to 4)

If you have some questions you are welcome.

Upvotes: 1

Eiyrioü von Kauyf
Eiyrioü von Kauyf

Reputation: 4725

You can find the relevant docs here :)

Simple Paths in NetworkX

essentially it's just a tree search of some sort

you can use the list of nodes and do this

list_of_nodes = nodes(my_graph)
for mysource in list_of_nodes:
    for mytarget in list_of_nodes:
        if source != target:
             print all_simple_paths(my_graph,source=mysource,target=mytarget, cutoff=4)

If you want all the shortest paths of length 4 you can use ``all_pairs_shortest_path_```

paths = all_pairs_shortest_path(my_graph,cutoff=4)
for k,v in paths:
    if len(paths[k][v]) == 4:
        print paths[k][v]

There is no real command built in to generate all the simple paths of a certain length as the only way to find that is to generate a tree from every node, e.g. doing a modified version of my for loop above since that uses a modified depth-first search.


"A path with no repeated vertices is called a simple path" - Wikipedia

If you would like a paper citation. The only way to check for simple paths is to check from each node. If you have a graph with 90000 elements; there is no other way to check if two connect :/. The 'target' is really not consequential as it's just another cutoff, though if you have a large number of nodes it can make a difference so you're right there :).

   " def _all_simple_paths_graph(G, source, target, cutoff=None):
        if cutoff < 1:
            return
        visited = [source]
        stack = [iter(G[source])]
        while stack:
            children = stack[-1]
            child = next(children, None)
            if child is None:
                stack.pop()
                visited.pop()
            elif len(visited) < cutoff:
                if child == target:
                    yield visited + [target]
                elif child not in visited:
                    visited.append(child)
                    stack.append(iter(G[child]))
            else: #len(visited) == cutoff:
                if child == target or target in children:
                    yield visited + [target]
                stack.pop()
                visited.pop()"

code above from networkx docs

to modify it in order to generate the DFS without the 'target' cutoff you can use:

   def _all_simple_paths_graph(G, source, target, cutoff=None):
        if cutoff < 1:
            return
        visited = [source]
        stack = [iter(G[source])]
        while stack:
            children = stack[-1]
            child = next(children, None)
            if child is None:
                stack.pop()
                visited.pop()
            elif len(visited) < cutoff:
                #if child == target:
                #    yield visited + [target]
                #elif child not in visited:
                 if child not in visited:
                    visited.append(child)
                    stack.append(iter(G[child]))
            #else: #len(visited) == cutoff:
                #if child == target or target in children:
                #    yield visited + [target]
                #stack.pop()
                #visited.pop()

Hope that works for you :)

Upvotes: 0

Related Questions