bbary
bbary

Reputation: 27

titan-hbase-solr graph load gremlin-server java.lang.OutOfMemoryError: GC overhead limit exceeded

We are trying to create a huge graph (around 100,000 vertices) in titan using gremlin server remote connection. We have followed the example code available at https://github.com/pluradj/titan-tp3-driver-example to create remote connection to titan via gremlin server. We are able to create indices, vertices, edges query the simple graphs created without any problem;

However, when we try to create a huge graph using a generator (it creates vertices and edges directly in the server using remote connection established) , we are getting the following error:

 6041316 [gremlin-server-exec-6] WARN  org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor  - Exception processing a script on request [RequestMessage{, requestId=81f949ad-0e37-4293-bcaa-0714cb159c3b, op='eval', processor='', args={gremlin=g.V().has('idObj', 'OC97').next().addEdge('OC_LC', g.V().has('idObj', 'LC9643').next()), batchSize=64}}].
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.codehaus.groovy.reflection.CachedClass$3.initValue(CachedClass.java:106)
    at org.codehaus.groovy.reflection.CachedClass$3.initValue(CachedClass.java:84)
    at org.codehaus.groovy.util.LazyReference.getLocked(LazyReference.java:49)
    at org.codehaus.groovy.util.LazyReference.get(LazyReference.java:36)
    at org.codehaus.groovy.reflection.CachedClass.getMethods(CachedClass.java:260)
    at groovy.lang.MetaClassImpl.addInterfaceMethods(MetaClassImpl.java:419)
    at groovy.lang.MetaClassImpl.fillMethodIndex(MetaClassImpl.java:342)
    at groovy.lang.MetaClassImpl.initialize(MetaClassImpl.java:3264)
    at org.codehaus.groovy.reflection.ClassInfo.getMetaClassUnderLock(ClassInfo.java:254)
    at org.codehaus.groovy.reflection.ClassInfo.getMetaClass(ClassInfo.java:285)
    at org.codehaus.groovy.reflection.ClassInfo.getMetaClass(ClassInfo.java:295)
    at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.getMetaClass(MetaClassRegistryImpl.java:261)
    at org.codehaus.groovy.runtime.InvokerHelper.getMetaClass(InvokerHelper.java:873)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.createPojoSite(CallSiteArray.java:125)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallSite(CallSiteArray.java:166)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133)
    at Script72559.run(Script72559.groovy:1)
    at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:534)
    at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:374)
    at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
    at org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines.eval(ScriptEngines.java:102)
    at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:258)
    at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor$$Lambda$137/1500273035.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

The graph generation is fast in the begining and slows down gradually and fails around 31000 vertices throwing the above error.

We have tried changing the default cache parameters as below

cache.db-cache=true
cache.db-cache-clean-wait=0
cache.db-cache-time=10000
cache.db-cache-size=0.1

Also we have tried deactivating cache by setting cache.db-cache=false. But none of the steps worked for us.

#Our environment:
CDH 5.7.1
Titan 1.1.0-SNAPSHOT
Solr 4.10.3
HBase 1.2.0

Could you please guide us how to overcome this problem?

Upvotes: 1

Views: 329

Answers (1)

bbary
bbary

Reputation: 27

The problem was forgetting to use Parameterized Scripts http://tinkerpop.apache.org/docs/current/reference/#parameterized-scripts

Gremlin Server caches all scripts that are passed to it: using Parameterized Scripts reduces caching because only not common scripts are cached (g.V(x))

Not using Parameterized Scripts and using instead String.format for example (like we did) implies caching all gremlin scripts separately which is very expensive and causes an OutOfMemoryError

I hope this would help someone ;)

Upvotes: 2

Related Questions