Reputation: 808
I'm performing a "stress test" with NEO4J database. It isn't a big deal, but the partial results make me wonder whether this technology is suitable for online applications (or simply I don't get Cypher).
The first test was to add node by node like
(1° node) -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> ... -[:NEXT_FRAME]-> (last node)
and then retrieve the entire path using this query
START n=node:Frame(node_id="0"), m=node:Frame(node_id="9000")
MATCH p=(n)-[:FRAME_NEXT*]->(m)
RETURN p
ORDER BY m.node_id DESC
LIMIT 1
Note, when m.node_id == 2
, the query takes ~100 ms. Now with ~9000 nodes, it can take up to 30 seconds. I am not an expert, but it is too much time! I don't think 9K nodes should make this much difference.
So, what am I missing?
Cheers (and Merry Xmas)
Edited:
I'm using py2neo and timing the query this way:
q_str = """
START n=node:Frame(node_id="0"), m=node:Frame(node_id="%d")
MATCH p=(n)-[:FRAME_NEXT*]->(m)
RETURN p
ORDER BY m.node_id DESC
LIMIT 1
""" % (i,)
print q_str
before = datetime.datetime.now()
query = neo4j.CypherQuery(graph_db, q_str)
record, = query.execute().data
after = datetime.datetime.now()
diff = after - before
diff_ms = diff.total_seconds() *1000
print 'Query took %.2f ms' % (diff_ms)
Upvotes: 1
Views: 685
Reputation: 41706
This is a shortcoming of Cypher, it doesn't handle long variable length paths well right now.
Could you try MATCH p=shortestPath((n)-[:FRAME_NEXT*]->(m))
?
Also if you can, can you try Neo4j 2.0 and instead of using the legacy index, using labels and the new indexes. According to the query plan, there the faster bidirectional-traversal-matcher should be used.
MATCH (n: Frame {node_id:"0"})-[:FRAME_NEXT*]->(m:Frame {node_id:"9000"})
RETURN p
Alternatively will be probably better off with a REST traverser: http://docs.neo4j.org/chunked/milestone/rest-api-traverse.html
or a REST-graph-algo: http://docs.neo4j.org/chunked/milestone/rest-api-graph-algos.html
Upvotes: 2
Reputation: 9952
If there is only one path you can drop ODER BY
and LIMIT
. Also try using the shortestPath
function, it can be much faster even if there is only one matching path in your graph (i.e. even if all paths are the shortest). Finally if you know how deep the variable depth relationship is, declare that in your pattern, and if you know only roughly how deep then specify a range. Try some combinations of these for comparison, you can profile them in the neo4j-shell and look at the execution plan.
MATCH p=(n)-[:FRAME_NEXT*9000]->(m)
MATCH p=(n)-[:FRAME_NEXT*8900..9100]->(m)
MATCH p=shortestPath( (n)-[:FRAME_NEXT*]->(m) )
MATCH p=shortestPath( (n)-[:FRAME_NEXT*8900..9100]->(m) )
Upvotes: 1
Reputation: 39925
The query tries to identify each and every path between n
and m
, which might be a huge number depending on the shape of your graph.
Try to avoid ORDER BY
in such situations. The following might be faster since only one single path needs to be identified:
START n=node:Frame(node_id="0"), m=node:Frame(node_id="9000")
MATCH p=(n)-[:FRAME_NEXT*]->(m)
RETURN p
LIMIT 1
If you're looking for pure performance, you'll be better off using traversal API or graph algorithms directly. This requires some Java coding.
Upvotes: 2