Areza
Areza

Reputation: 6080

Neo4J connectivity and recommendation among a set of nodes

My database contains a set of nodes and one numeric type of relationship. Given set of nodes I would like to see whether and how they are connected to with each others as well as find nodes that can be included. I am sorry to not be able to synthesize a data. You can think of each node as a city and the relationship is a distance. If the relationship does not exist, means there is not direct way from city A to B. The idea has two fold 1) find whether and how these cities are connected. 2) find other intermediate cities.

For example, if I have A, B, C, D and E - one way is to do a pair-wise shortest distance. The graph is non-directional, so A-B is the same as B-A.

MATCH (n:people {name:'A'})-[r:INTERACTION]-(n2:people {name:'B'}),
      p = shortestPath ((n)-[*]-(n2))
RETURN p

However, instead of doing that (n*n-1)/2 times, I can do something like this:

MATCH (n:people)-[r:INTERACTION]-(n2:people)
MATCH spath=shortestPath ((n)-[*]-(n2))
WHERE n.name in ['A', 'B', 'C', 'D', 'E'] and n2.name in ['A', 'B', 'C', 'D', 'E']
RETURN DISTINCT n.name, n2.name, r.score 

Still, towards the second goal, finding other intermediate cities, I wonder if there is any other concept for doing such analysis within neo4j, and if not, how would you make sure, the output is distinct for each pair, meaning there is no A-B and B-A out, rather only one of them.

Upvotes: 1

Views: 61

Answers (1)

Dave Bennett
Dave Bennett

Reputation: 11216

Certainly if you have a set of data and you want to do something with unique sets where the sets do not contain duplicates it would be best to produce that unique set of items first and then pass that to your query so you don't have to perform unecessary queries and then discard the redunt information afterwards.

To produce a uniqe set of pair sets you can do something like this.

// unwind the name collection twice to get all combinations
WITH ['A', 'B', 'C', 'D', 'E'] AS names
UNWIND names AS name1
UNWIND names AS name2

// makes sure the pairs are ordered and ignore identical pairs
WITH CASE 
       WHEN name1 = name2 THEN NULL 
       WHEN name2 < name1 THEN [name2, name1] 
       ELSE [name1, name2] 
     END AS pair

// collect all of the pairs
RETURN COLLECT (DISTINCT pair) AS pairs

If you are using APOC you could replace that with this

WITH ['A', 'B', 'C', 'D', 'E'] AS names
RETURN apoc.coll.combinations(names, 2) AS pairs

Putting it together with your query you can do something like this...

// from above
WITH ['A', 'B', 'C', 'D', 'E'] AS names
UNWIND names AS name1
UNWIND names AS name2
WITH CASE WHEN name1 = name2 THEN NULL WHEN name2 < name1 THEN [name2, name1] ELSE [name1, name2] END AS pair
WITH COLLECT (DISTINCT pair) AS pairs

// iterate over the unique pairs
UNWIND pairs AS pair

// find the shortest path 
// from  
// the first node identified by the first value in the pair
// to 
// the second node identified by the second value in the pair
MATCH spath = shortestPath((:people {name: pair[0]} )-[*..5]-(:people {name: pair[1]} ))

// produce a string representation of each path 
RETURN reduce(string = "", rel in relationships(spath) | string + startNode(rel).name + " " + type(rel) + " " + endNode(rel).name + ", ")

This query will return the minimum set of paths from your set of "destinations" without any duplication.

Upvotes: 1

Related Questions