Sint
Sint

Reputation: 1620

Cypher query to match strong two way relationship and filter by property values in Neo4j

I would like to find a subgraph in Neo4j DB with only strong two way relationships.

Let's say the relationship is LOVES and attribute is Kisses then I would like to only find a subgraph where both sides have kissed the other more than 2 times.

START n=node(*)
MATCH n-[r1:LOVES]->friend<-[r2:LOVES]-n
WHERE r1.Kisses > 2 and r2.Kisses > 2
RETURN n, friend, r1, r2
LIMIT 20

the problem is that the query seems to run forever on a 3M node 30M relationship graph, on a 32GB RAM, quad core system with 16GB max heap for neo4j(neo4j calculator suggested 8GB).

I suspect, there is an endless loop hiding somewhere.

OS: Ubuntu 12.04.1 LTS server

Soft: neo4j-community-1.8.1

java version "1.7.0_10" (neo4j start says to use JDK6)

Java(TM) SE Runtime Environment (build 1.7.0_10-b18)

Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

EDIT: match is incorrect it should be

MATCH n-[r1:LOVES]->friend-[r2:LOVES]->n

UPDATE: After correcting semantics above, I am still unable to get a full result in 5 minutes.

LOVES is the only relationship type and about 10-20% or relationships have a corresponding one going the other way.

My ultimate goal is to find appropriate Kiss values so that I am left with <100k nodes (and all appropriate LOVES relationships) and can export this subgraph.

Here's pseudocode for algorithm for what I am trying to do:

let E be edge.list to be processed
let myedgelist be empty list
for e in E:
  if e.n1 > e.n2: # so we do not look twice
    continue
  if not exist(e[n2][n1]): # this is where lookup can be a problem O(1) for hash, O(logn) for sorted, O(n) for something random)
    continue
  if e.kisses > 2 and e[n2][n1].kisses > 2:
    add e to myedgelist
    add e[n2][n1] to myedgelist
return myedgelist

This algorithm should run at most edgecount * log(edgecount), unless there is no effective way to lookup existence of reverse relationship in neo4j, which would seem rather inconceivable.

Upvotes: 3

Views: 1324

Answers (2)

Michael Hunger
Michael Hunger

Reputation: 41706

Can you try to make friend the target node

START friend=node(*)
MATCH n-[r1:LOVES]->friend-[r2:LOVES]->n
WHERE r1.Kisses > 2 and r2.Kisses > 2
RETURN n, friend, r1, r2
LIMIT 20

you can try to split the match into two parts

START n=node(*)
MATCH n-[r1:LOVES]->friend
WHERE r1.Kisses > 2
WITH n,r1,friend
MATCH n<-[r2:LOVES]-friend
WHERE r2.Kisses > 2
RETURN n, friend, r1, r2
LIMIT 20

or alternatively only do half of the query with a match and the second half with a filter

START n=node(*) 
MATCH n-[r1:LOVES]->friend 
WHERE r1.Kisses > 2 
 AND ALL(r2 in extract(
              p in friend-[:LOVES]->n : head(rels(p))) 
         WHERE r2.Kisses > 2) 
RETURN n, friend, r1
LIMIT 20

here's a console to try the queries: http://console.neo4j.org/r/tqvb1p

But after all you're processing 3M nodes with a match that potentially blows up to 3M^2

Upvotes: 1

Eve Freeman
Eve Freeman

Reputation: 33185

You should try this in 1.9--the cypher matcher has been optimized significantly for this type of query. Although it would be nice to avoid the node(*). Are all of your nodes people? If not you might want to do an index query to only get the relevant person nodes.

start n=node:people("name:*")...

Upvotes: 1

Related Questions