Reputation: 171
I have Neo4j with quite simple schema. There is only one type of node and one type of relationship which can bind nodes. Each node have one property (indexed) and each relationship has four properties. These are the numbers:
neo4j-sh (?)$ dbinfo -g "Primitive count"
{
"NumberOfNodeIdsInUse": 19713210,
"NumberOfPropertyIdsInUse": 109295019,
"NumberOfRelationshipIdsInUse": 44903404,
"NumberOfRelationshipTypeIdsInUse": 1
}
I run this database on virtual machine with Debian, 7 cores and 26GB of RAM. This is my Neo4j configuration:
neo4j.properties:
neostore.nodestore.db.mapped_memory=3000M
neostore.relationshipstore.db.mapped_memory=4000M
neostore.propertystore.db.mapped_memory=4000M
neostore.propertystore.db.strings.mapped_memory=300M
neostore.propertystore.db.arrays.mapped_memory=300M
neo4j-wrapper.conf:
wrapper.java.additional=-XX:+UseParallelGC
#wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.initmemory=2000
wrapper.java.maxmemory=10000
I use UseParallelGC instead of UseConcMarkSweepGC, because I noticed that with UseConcMarkSweepGC only one CPU core is used during query and when I changed to UseParallelGC all cores are utilzed. I do not run any queries in parallel. Only one at a time in neo4j-shell, but mostly concerning the whole set of nodes for example:
match (n:User)-->(k:User)
return n.id, count(k) as degree
order by degree desc limit 100;
and it takes 726230 ms to execute it. I also tried:
match (n:User)-->()-->(k:User)
return n.id, count(DISTINCT k) as degree
order by degree desc limit 100;
but after a long time I get only "Error occurred in server thread; nested exception is: java.lang.OutOfMemoryError: GC overhead limit exceeded". I did not try queries with restrictions taking into account relationships properties, but it is also planned. I think that my configuration is not optimal. I noticed that Neo4j uses at most 50% of system memory during query and remaining memory is free. I could change this by setting larger value in wrapper.java.maxmemory, but I have read that I have to leave some memory for mapped_memory setings. However, I am not sure if they are taken into account, because during query there is a lot of free memory. How should I set the configuration for such queries?
Upvotes: 1
Views: 164
Reputation: 39925
Your queries are global queries that get slower with increasing amount of data. For every user node the number of outgoing relationships is calculated, put into a collection and sorted by count. This kind of operation consumes a lot of CPU and memory. Instead of tweaking config I guess you're better off refactoring your graph model.
Depending on your use case consider storing the degree
of a user in a property on the user node. Of course any operation adding/removing a relationship for a user needs to be reflected in the degree
property. Additionally you might want to index the degree
property.
Upvotes: 1