How to configure Neo4j for a specific task

Question

I have Neo4j with quite simple schema. There is only one type of node and one type of relationship which can bind nodes. Each node have one property (indexed) and each relationship has four properties. These are the numbers:

neo4j-sh (?)$ dbinfo -g "Primitive count"
{
  "NumberOfNodeIdsInUse": 19713210,
  "NumberOfPropertyIdsInUse": 109295019,
  "NumberOfRelationshipIdsInUse": 44903404,
  "NumberOfRelationshipTypeIdsInUse": 1
}

I run this database on virtual machine with Debian, 7 cores and 26GB of RAM. This is my Neo4j configuration:

neo4j.properties:

neostore.nodestore.db.mapped_memory=3000M
neostore.relationshipstore.db.mapped_memory=4000M
neostore.propertystore.db.mapped_memory=4000M
neostore.propertystore.db.strings.mapped_memory=300M
neostore.propertystore.db.arrays.mapped_memory=300M

neo4j-wrapper.conf:

wrapper.java.additional=-XX:+UseParallelGC
#wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.initmemory=2000
wrapper.java.maxmemory=10000

I use UseParallelGC instead of UseConcMarkSweepGC, because I noticed that with UseConcMarkSweepGC only one CPU core is used during query and when I changed to UseParallelGC all cores are utilzed. I do not run any queries in parallel. Only one at a time in neo4j-shell, but mostly concerning the whole set of nodes for example:

match (n:User)-->(k:User)
return n.id, count(k) as degree
order by degree desc limit 100;

and it takes 726230 ms to execute it. I also tried:

match (n:User)-->()-->(k:User)
return n.id, count(DISTINCT k) as degree
order by degree desc limit 100;

but after a long time I get only "Error occurred in server thread; nested exception is: java.lang.OutOfMemoryError: GC overhead limit exceeded". I did not try queries with restrictions taking into account relationships properties, but it is also planned. I think that my configuration is not optimal. I noticed that Neo4j uses at most 50% of system memory during query and remaining memory is free. I could change this by setting larger value in wrapper.java.maxmemory, but I have read that I have to leave some memory for mapped_memory setings. However, I am not sure if they are taken into account, because during query there is a lot of free memory. How should I set the configuration for such queries?

How to configure Neo4j for a specific task

Answers (1)

Related Questions