Reputation: 1620
I would like to find a subgraph in Neo4j DB with only strong two way relationships.
Let's say the relationship is LOVES and attribute is Kisses then I would like to only find a subgraph where both sides have kissed the other more than 2 times.
START n=node(*)
MATCH n-[r1:LOVES]->friend<-[r2:LOVES]-n
WHERE r1.Kisses > 2 and r2.Kisses > 2
RETURN n, friend, r1, r2
LIMIT 20
the problem is that the query seems to run forever on a 3M node 30M relationship graph, on a 32GB RAM, quad core system with 16GB max heap for neo4j(neo4j calculator suggested 8GB).
I suspect, there is an endless loop hiding somewhere.
OS: Ubuntu 12.04.1 LTS server
Soft: neo4j-community-1.8.1
java version "1.7.0_10" (neo4j start says to use JDK6)
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
EDIT: match is incorrect it should be
MATCH n-[r1:LOVES]->friend-[r2:LOVES]->n
UPDATE: After correcting semantics above, I am still unable to get a full result in 5 minutes.
LOVES is the only relationship type and about 10-20% or relationships have a corresponding one going the other way.
My ultimate goal is to find appropriate Kiss values so that I am left with <100k nodes (and all appropriate LOVES relationships) and can export this subgraph.
Here's pseudocode for algorithm for what I am trying to do:
let E be edge.list to be processed
let myedgelist be empty list
for e in E:
if e.n1 > e.n2: # so we do not look twice
continue
if not exist(e[n2][n1]): # this is where lookup can be a problem O(1) for hash, O(logn) for sorted, O(n) for something random)
continue
if e.kisses > 2 and e[n2][n1].kisses > 2:
add e to myedgelist
add e[n2][n1] to myedgelist
return myedgelist
This algorithm should run at most edgecount * log(edgecount), unless there is no effective way to lookup existence of reverse relationship in neo4j, which would seem rather inconceivable.
Upvotes: 3
Views: 1324
Reputation: 41706
Can you try to make friend
the target node
START friend=node(*)
MATCH n-[r1:LOVES]->friend-[r2:LOVES]->n
WHERE r1.Kisses > 2 and r2.Kisses > 2
RETURN n, friend, r1, r2
LIMIT 20
you can try to split the match into two parts
START n=node(*)
MATCH n-[r1:LOVES]->friend
WHERE r1.Kisses > 2
WITH n,r1,friend
MATCH n<-[r2:LOVES]-friend
WHERE r2.Kisses > 2
RETURN n, friend, r1, r2
LIMIT 20
or alternatively only do half of the query with a match and the second half with a filter
START n=node(*)
MATCH n-[r1:LOVES]->friend
WHERE r1.Kisses > 2
AND ALL(r2 in extract(
p in friend-[:LOVES]->n : head(rels(p)))
WHERE r2.Kisses > 2)
RETURN n, friend, r1
LIMIT 20
here's a console to try the queries: http://console.neo4j.org/r/tqvb1p
But after all you're processing 3M nodes with a match that potentially blows up to 3M^2
Upvotes: 1
Reputation: 33185
You should try this in 1.9--the cypher matcher has been optimized significantly for this type of query. Although it would be nice to avoid the node(*). Are all of your nodes people? If not you might want to do an index query to only get the relevant person nodes.
start n=node:people("name:*")
...
Upvotes: 1