neo4j How to cahe all the nodes and relations to RAM for importing query performance

Question

I have installed the APOC Procedures and used "CALL apoc.warmup.run."

The result is as follow:

pageSize
8192

nodesPerPage  nodesTotal  nodesLoaded  nodesTime
546           156255221   286182       21

relsP‌erPage   relsTotal   rel‌sLoaded   relsTime
240           167012639   695886       8

tot‌alTime
30

It looks like the neo4j server only caches part of nodes and relations. But I want it to cache all the nodes and relationships in order to improve query performance.

Frank Pavageau · Accepted Answer

First of all, for all data to be cached, you need a page cache large enough.

Then, the problem is not that Neo4j does not cache all it can, it's more of a bug in the apoc.warmup.run procedure: it retrieves the number of nodes (resp. relationships) in the database, and expects them to all have ids between 1 and that number of nodes (resp. relationships). However, it's not true if you've had some churn in the DB, like creating more nodes then deleting some of them.

I believe that could be fixed by using another query instead:

MATCH (n) RETURN count(n) AS count, max(id(n)) AS maxId

as profiling it shows about the same number of DB hits as the number of nodes, and takes about 650 ms on my machine for 1.4 million nodes.

Update: I've opened an issue on the subject.

Update 2

While the issue with the ids is real, I missed the real reason why the procedure reports reading far less nodes: it only reads one node per page (assuming they're stored sequentially), since it's the pages that are cached. With the current values, that means trying to read one node every 546 nodes. It happens that 156255221 ÷ 546 = 286181, and with node 0 that makes it 286182 nodes loaded.

neo4j How to cahe all the nodes and relations to RAM for importing query performance

Answers (1)

Related Questions