Neo4J - Finding the widest path on very large graphs

Question

I have created a very large directional weighted graph, and I'm trying to find the widest path between two points.

each edge has a count property

Here is a small portion of the graph: enter image description here

I have found this example and modified the query, so the path collecting would be directional like so:

MATCH p = (v1:Vertex {name:'ENTRY'})-[:TRAVELED*]->(v2:Vertex {name:'EXIT'})
WITH p, EXTRACT(c IN RELATIONSHIPS(p) | c.count) AS counts
UNWIND(counts) AS b
WITH p, MIN(b) AS count
ORDER BY count DESC
RETURN NODES(p) AS `Widest Path`, count
LIMIT 1

This query seems to require an enormous amount of memory, and fails even on partial data.

Update: for classification, the query is running until running out of memory.

I've found this link, that combines the use of spark and neo4j. Unfortunately Mazerunner for Neo4j, does not support "widest path" algorithm out of the box. What would be the right approach to run the "widest path" query on a very large graph?

FrobberOfBits · Accepted Answer

The reason your algorithm is taking a long time to run is because (a) you have a big graph, (b) your memory parameters probably need tweaking (see comments) and (c) you're enumerating every possible path between ENTRY and EXIT. Depending on what your graph is structured like, this could be a huge number of paths.

Note that if you're looking for the broadest path, then broadest is the largest/smallest weight on an edge. This means that you're probably computing and re-computing many paths you can ignore.

Wikipedia has good information on this algorithm you should consider. In particular:

It is possible to find maximum-capacity paths and minimax paths with a single source and single destination very efficiently even in models of computation that allow only comparisons of the input graph's edge weights and not arithmetic on them.[12][18] The algorithm maintains a set S of edges that are known to contain the bottleneck edge of the optimal path; initially, S is just the set of all m edges of the graph. At each iteration of the algorithm, it splits S into an ordered sequence of subsets S1, S2, ... of approximately equal size; the number of subsets in this partition is chosen in such a way that all of the split points between subsets can be found by repeated median-finding in time O(m). The algorithm then reweights each edge of the graph by the index of the subset containing the edge, and uses the modified Dijkstra algorithm on the reweighted graph; based on the results of this computation, it can determine in linear time which of the subsets contains the bottleneck edge weight. It then replaces S by the subset Si that it has determined to contain the bottleneck weight, and starts the next iteration with this new set S. The number of subsets into which S can be split increases exponentially with each step, so the number of iterations is proportional to the iterated logarithm function, O(logn), and the total time is O(m logn).[18] In a model of computation where each edge weight is a machine integer, the use of repeated bisection in this algorithm can be replaced by a list-splitting technique of Han & Thorup (2002), allowing S to be split into O(√m) smaller sets Si in a single step and leading to a linear overall time bound.

You should consider implementing this approach with cypher rather than your current "enumerate all paths" approach, as the "enumerate all paths" approach has you re-checking the same edge counts for as many paths as there are that involve that particular edge.

There's not ready-made software that will just do this for you, I'd recommend taking that paragraph (and checking its citations for further information) and then implementing that. I think performance wise you can do much better than your current query.

Neo4J - Finding the widest path on very large graphs

Answers (2)

Related Questions