Srinath Ganesh
Srinath Ganesh

Reputation: 2558

Cypher Query: Find minimum and maximum relationship length over a big graphs

Objective: Find the minimum and maximum relationship lengths between two Node Types.

Example: Following are the dummy connections for a Node type 'T'

So the minimum hops to reach two nodes is 1 (i.e. Aplha is directly linked to Charlie), and the maximum hops to reach two nodes is 2 (i.e. (Aplha)--(Beta)--(Charlie)).

Cypher Query I have is like:

MATCH (from:T), (to:T), p=shortestPath((from)-[*]-(to))
RETURN min(length(p)) as min, max(length(p)) as max

Which works fine for smaller data-sets but for 2000 nodes is 'from' and 2000 nodes is 'to' connected with a level like min:5 and max:10 hops, this query takes like 30mins to run.

Is there any way to achieve this operations is a faster way?

Solutions I CANNOT use:

Upvotes: 2

Views: 1903

Answers (1)

InverseFalcon
InverseFalcon

Reputation: 30407

Minimum is easy enough using APOC's path expander procedures (only the latest winter 2018 release for either 3.2.x or 3.3.x).

You can use one group as your start nodes, and use the :T label in the label filter as the termination label (the end of the path for expansion) and add a limit:

MATCH (start:T)
WITH collect(start) as startNodes
CALL apoc.path.expandConfig(startNodes, {labelFilter:'/T', limit:1, minLevel:1, uniqueness:'NODE_GLOBAL'}) YIELD path
RETURN length(path) as min

We're using expandConfig() and NODE_GLOBAL uniqueness, which drastically helps out during expansion as we can prune (don't need to consider) any paths that end at a node that has already been visited.

The path expander procedures are great when you're looking for paths to nodes with certain labels, note that we don't need to match to end nodes and create cross products, we will evaluate labels of nodes during expansion, and stop when a node with the :T label is reached.

The limit:1 will automatically stop when the first result is found, and the expansion uses breadth-first-search, so the first match will be the shortest path possible.

For finding the longest of ALL the shortest paths (from each :T node to just its nearest :T node), the approach will be similar, but we will not collect the results, so the procedure will execute for every single :T node.

MATCH (start:T)
CALL apoc.path.expandConfig(start, {labelFilter:'/T', limit:1, minLevel:1, uniqueness:'NODE_GLOBAL'}) YIELD path
RETURN max(length(path)) as maxShortest

For finding the longest shortest-path between every two :T nodes, however, is likely to perform worse.

We can use a similar approach, but we'll remove the LIMIT, and change the label filter to use :T as an end node (paths must end at :T nodes, but can expand past them to find paths to other :T nodes)

MATCH (start:T)
CALL apoc.path.expandConfig(start, {labelFilter:'>T', minLevel:1, uniqueness:'NODE_GLOBAL'}) YIELD path
RETURN max(length(path)) as maxShortestOfAllPairs

Upvotes: 4

Related Questions