Reputation: 597
Am I doing something wrong with this this gremlin query? Is this not a performant query ? My 2 nodejs instances on AWS use the gremlin client which talks over websockets through an AWS ELB to 2 Titan 1.0/gremlin server instances.The backend is DynamoDB.We have the right read/write throughput for DynamoDB configured now.
Log:
WARN org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor - Exception processing a script on request [RequestMessage{, requestId=r1, op='eval', processor='', args={gremlin=
def user = g.V().has("userId", userId1).has("tenantId", tenantId).hasLabel(userLabel).next();g.V(user).outE(eIsOwnedByLabel).inV().as('path').inE(eHasAccessToLabel).or(.has('shareToType',allType).outV().has('tenantId',tenantId).outE(eHasAccessToLabel),.has('shareToType',groupType).outV().hasLabel(groupLabel).inE(eIsMemberOfLabel,eIsAdminOfLabel).outV().has('userId',userId).outE(eIsMemberOfLabel,eIsAdminOfLabel).inV().outE(eHasAccessToLabel),__.has('shareToType',userType).outV().hasLabel(userLabel).has('userId',userId).outE(eHasAccessToLabel)).as('role').inV().select('role','path').by('role').by('path');,
bindings={tenantId=1, userLabel=User, userId1=2, eIsOwnedByLabel=is_owned_by, eHasAccessToLabel=has_access_to, eIsMemberOfLabel=is_member_of, eIsAdminOfLabel=is_admin_of, userId=a1, groupLabel=Group, groupType=group, userType=user, allType=all}, accept=application/json, language=gremlin-groovy}}]. org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException
When we stress test, the gremlin servers just stops responding and gives us errors like this:
{"name":"logger","hostname":"a","pid":27881,"level":"ERROR","err":{"message":"null (Error 597)","name":"Error","stack":"Error: null (Error 597)\n at GremlinClient.handleProtocolMessage (/opt/application/sharing-app/node_modules/gremlin/lib/GremlinClient.js:204:39)\n at WebSocketGremlinConnection. (/opt/application/sharing-app/node_modules/gremlin/lib/GremlinClient.js:120:23)\n at emitOne (events.js:96:13)\n at WebSocketGremlinConnection.emit (events.js:188:7)\n at WebSocketGremlinConnection.handleMessage (/opt/application/sharing-app/node_modules/gremlin/lib/WebSocketGremlinConnection.js:69:12)\n at WebSocketGremlinConnection._this.ws.onmessage (/opt/application/sharing-app/node_modules/gremlin/lib/WebSocketGremlinConnection.js:46:20)\n
I tried to run a profile() locally using g.V().has("userId", '1').has("tenantId", '2').hasLabel('User').outE('is_owned_by')....: ==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TitanGraphStep([userId.eq(51ce1780-1998-47eb-a1... 0 0 190.524 24.91
optimization 176.456
backend-query 0 6.074
backend-query 0 5.067
TitanVertexStep(OUT,[is_owned_by],vertex)@[path] 0 0 0.005 0.00
TitanVertexStep(IN,[has_access_to],edge) 0 0 190.539 24.91
OrStep([[HasStep([shareToType.eq(all)]), Profil... 0 0 0.012 0.00
HasStep([shareToType.eq(all)]) 0 0 0.000
EdgeVertexStep(OUT) 0 0 0.000
HasStep([tenantId.eq(ndgThunderDome)]) 0 0 0.000
TitanVertexStep(OUT,[has_access_to],edge) 0 0 0.000
HasStep([shareToType.eq(group)]) 0 0 0.000
EdgeVertexStep(OUT) 0 0 0.000
HasStep([~label.eq(Group)]) 0 0 0.000
TitanVertexStep(IN,[is_member_of, is_admin_of... 0 0 0.000
HasStep([userId.eq(a257c260-261f-45df-a1e7-92... 0 0 0.000
TitanVertexStep(OUT,[is_member_of, is_admin_o... 0 0 0.000
TitanVertexStep(OUT,[has_access_to],edge) 0 0 0.000
HasStep([shareToType.eq(user)]) 0 0 0.000
EdgeVertexStep(OUT) 0 0 0.000
HasStep([~label.eq(User)]) 0 0 0.000
HasStep([userId.eq(a257c260-261f-45df-a1e7-92... 0 0 0.000
TitanVertexStep(OUT,[has_access_to],edge) 0 0 0.000
EdgeVertexStep(IN) 0 0 190.550 24.91
SelectStep([role, path],[value(role), value(pat... 0 0 0.021 0.00
SideEffectCapStep([~metrics]) 1 1 193.286 25.27
>TOTAL - - 764.940 -
TIA
Upvotes: 0
Views: 518
Reputation: 597
The script was not the issue. The Titan Db got overloaded with requests and the performance degraded with scripts timing out. Changing dynamodb.properties to add
cache.db-cache=true
cache.db-cache-time=...
cache.db-cache-size=0.3
cache.db-cache-clean-wait=50
Adding cache helped reduce the load on the Db and helped increase the requests/sec flowing through.
Changed gremlin-server.yaml too: threadPoolWorker =2 Not sure how to change threadPoolWorker based on the CPU cores though on our m4.large AWS instance with 2 CPU cores. Also changed by playing around with the values: maxAccumulationBufferComponents:8192 resultIterationBatchSize:2048
Upvotes: 0