Reputation: 31
I have installed the APOC Procedures and used "CALL apoc.warmup.run."
The result is as follow:
pageSize 8192 nodesPerPage nodesTotal nodesLoaded nodesTime 546 156255221 286182 21 relsPerPage relsTotal relsLoaded relsTime 240 167012639 695886 8 totalTime 30
It looks like the neo4j server only caches part of nodes and relations. But I want it to cache all the nodes and relationships in order to improve query performance.
Upvotes: 0
Views: 271
Reputation: 11715
First of all, for all data to be cached, you need a page cache large enough.
Then, the problem is not that Neo4j does not cache all it can, it's more of a bug in the apoc.warmup.run
procedure: it retrieves the number of nodes (resp. relationships) in the database, and expects them to all have ids between 1 and that number of nodes (resp. relationships). However, it's not true if you've had some churn in the DB, like creating more nodes then deleting some of them.
I believe that could be fixed by using another query instead:
MATCH (n) RETURN count(n) AS count, max(id(n)) AS maxId
as profiling it shows about the same number of DB hits as the number of nodes, and takes about 650 ms on my machine for 1.4 million nodes.
Update: I've opened an issue on the subject.
Update 2
While the issue with the ids is real, I missed the real reason why the procedure reports reading far less nodes: it only reads one node per page (assuming they're stored sequentially), since it's the pages that are cached. With the current values, that means trying to read one node every 546 nodes. It happens that 156255221 ÷ 546 = 286181, and with node 0
that makes it 286182 nodes loaded.
Upvotes: 2