Reputation: 11
I am using GraphDB loadrdf tool to load an ontology and a fairly big data. I set pool.buffer.size=800000 and jvm -Xmx to 24g. I tried both parallel and serial modes. They both slow down once the repo total statements go over about 10k. It eventually slows down to 1 or 2 statements/second. Does anyone know if this is a normal behavior of loadrdf or there's a way to optimize the performance?
Edit I have increased tuple-index-memory. See part of my repo ttl configuration:
owlim:entity-index-size "45333" ;
owlim:cache-memory "24g" ;
owlim:tuple-index-memory "20g" ;
owlim:enable-context-index "false" ;
owlim:enablePredicateList "false" ;
owlim:predicate-memory "0" ;
owlim:fts-memory "0" ;
owlim:ftsIndexPolicy "never" ;
owlim:ftsLiteralsOnly "true" ;
owlim:in-memory-literal-properties "false" ;
owlim:transaction-mode "safe" ;
owlim:transaction-isolation "true" ;
owlim:disable-sameAs "true";
But somehow the process still slows down. It starts with "Global average rate: 1,402 st/s". But slows down to "Global average rate: 20 st/s" after "Statements in repo: 61,831". I give my jvm: -Xms24g -Xmx36g
Upvotes: 1
Views: 204
Reputation: 66
I've looked at you repository configuration ttl. There is this parameter: entity-index-size=45333 whose value needs to be increased, e.g. set it to 100 million (entity-index-size=100000000). Default value for that parameter in GraphDB 7 is 10M, but since you've set it explicitly it gets overriden.
You can read more about that parameter here
Upvotes: 0
Reputation: 173
can you please post your repository configuration? Inside it, there is a parameter tuple-index-memory - this will determine the amount of changes(disc pages) that we are allowed to keep in memory. The bigger this value is the smaller amount of flushes we are going to do.
Check if this is set to a value like 20G in your setup and retry the process again.
Upvotes: 1