Reputation: 1152
I am currently working on a project using neo4j as database and queries that involve some hard relationship discover, and after running performance testing we are having some issues.
We have found out that cache is influencing the time of the requests insanely (from 3000ms to 100ms or so). Doing the same request twice would result in one really slow, and the second one much faster. After some searches we saw the warm-up method, that is going to preload all the nodes and relationships in the database querying something like this:
match (n)-[r]->() return count(1);
Having cache activated plus this warm-up query we had a big decrease of the time of our queries, but still not as fast as if you queried two, three or four times the same query.
So we went on testing and searching info until that we saw that Neo4j is also somehow buffering the queries in order to not be compiled every time (using Scala compiler, if I am right). I say somehow, because after intense testing I could conclude that Neo4j is compiling the query "on the fly".
Let me show a simplified example of what I mean:
(numbers are id attributes)
If I make a request like the following:
match (n:green {id: 1})-[r]->(:red)-[s]->(:green)<-[t]-(m:yellow {id: 7})
return count(m);
What I want to do is to find if there is a connection between the node 1 and the node. As you can see, I have to discover a bunch of nodes and more important, relationships, and the compile process looks more or less complicated since the request took 1227 ms to complete. If I make exactly the same request again, I get a response time of about 5 ms, good enough to pass the performance testing. Definitely Neo4j or the Scala compiler was buffering the cypher queries too.
After understanding that there is a compile process in the cypher request, I went deeper and started modifying only parts of an already buffered request. Changing the label or id parameter of the last node matched was also producing a delay, but only ~19 ms, still acceptable:
match (n:green {id: 1})-[r]->(:red)-[s]->(:green)<-[t]-(m:purple {id: 7})
return count(m);
However, when I restart the server, do warm-up and adjust the query so that the first node (labelled before as n) doesn't match, the query will respond very fast with 0 results so I can deduce that not all the query was parsed, since the first node didn't match and there is no need to go deeper in the tree.
I also tried with optional match, providing that returns null if no match was found, but it isn't working either.
I wanted to ask first of all if so far everything that I said based in my tests is correct and in case that it is not, how it's actually working ? And secondly, what should I do (if there is a way) to cache everything at the beginning, when the server started. Unfortunately, the requirements of the project say that queries should perform well, even the first one (and not to say that the real scenario has thousands more relationships and nodes, making everything slower), or if there is no way to avoid this delay.
Upvotes: 1
Views: 1114
Reputation: 15086
First of all you need to consider JVM warm up - beware that classes are loaded lazily when needed (your first query) and JIT may only kick in after several (thousands) of calls.
This
match (n)-[r]->() return count(1);
should properly warm up node and relationship cache, however I am not sure if it also loads all their properties and indexes. Also make sure that your data set fits in memory.
Providing values directly in cypher query like this: {id: 1}
, instead of using parameters{id: {paramId}}
means that when you change the value of the id then the query needs to be compiled again.
You can pass parameters in this way in shell:
neo4j-sh (?)$ export paramId=5
neo4j-sh (?)$ return {paramId};
==> +-----------+
==> | {paramId} |
==> +-----------+
==> | 5 |
==> +-----------+
==> 1 row
==> 4 ms
So if you need to have performing queries from the beginning
EDIT: added information how to pass parameters in shell
Upvotes: 4