Vijay
Vijay

Reputation: 597

Gremlin server stops responding on stress test with a gremlin query

Am I doing something wrong with this this gremlin query? Is this not a performant query ? My 2 nodejs instances on AWS use the gremlin client which talks over websockets through an AWS ELB to 2 Titan 1.0/gremlin server instances.The backend is DynamoDB.We have the right read/write throughput for DynamoDB configured now.

Log:

WARN org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor - Exception processing a script on request [RequestMessage{, requestId=r1, op='eval', processor='', args={gremlin=

def user = g.V().has("userId", userId1).has("tenantId", tenantId).hasLabel(userLabel).next();g.V(user).outE(eIsOwnedByLabel).inV().as('path').inE(eHasAccessToLabel).or(.has('shareToType',allType).outV().has('tenantId',tenantId).outE(eHasAccessToLabel),.has('shareToType',groupType).outV().hasLabel(groupLabel).inE(eIsMemberOfLabel,eIsAdminOfLabel).outV().has('userId',userId).outE(eIsMemberOfLabel,eIsAdminOfLabel).inV().outE(eHasAccessToLabel),__.has('shareToType',userType).outV().hasLabel(userLabel).has('userId',userId).outE(eHasAccessToLabel)).as('role').inV().select('role','path').by('role').by('path');,

bindings={tenantId=1, userLabel=User, userId1=2, eIsOwnedByLabel=is_owned_by, eHasAccessToLabel=has_access_to, eIsMemberOfLabel=is_member_of, eIsAdminOfLabel=is_admin_of, userId=a1, groupLabel=Group, groupType=group, userType=user, allType=all}, accept=application/json, language=gremlin-groovy}}]. org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException

When we stress test, the gremlin servers just stops responding and gives us errors like this:

{"name":"logger","hostname":"a","pid":27881,"level":"ERROR","err":{"message":"null (Error 597)","name":"Error","stack":"Error: null (Error 597)\n at GremlinClient.handleProtocolMessage (/opt/application/sharing-app/node_modules/gremlin/lib/GremlinClient.js:204:39)\n at WebSocketGremlinConnection. (/opt/application/sharing-app/node_modules/gremlin/lib/GremlinClient.js:120:23)\n at emitOne (events.js:96:13)\n at WebSocketGremlinConnection.emit (events.js:188:7)\n at WebSocketGremlinConnection.handleMessage (/opt/application/sharing-app/node_modules/gremlin/lib/WebSocketGremlinConnection.js:69:12)\n at WebSocketGremlinConnection._this.ws.onmessage (/opt/application/sharing-app/node_modules/gremlin/lib/WebSocketGremlinConnection.js:46:20)\n

I tried to run a profile() locally using g.V().has("userId", '1').has("tenantId", '2').hasLabel('User').outE('is_owned_by')....: ==>Traversal Metrics

Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TitanGraphStep([userId.eq(51ce1780-1998-47eb-a1...                     0           0         190.524    24.91
  optimization                                                                               176.456
  backend-query                                                        0                       6.074
  backend-query                                                        0                       5.067
TitanVertexStep(OUT,[is_owned_by],vertex)@[path]                       0           0           0.005     0.00
TitanVertexStep(IN,[has_access_to],edge)                               0           0         190.539    24.91
OrStep([[HasStep([shareToType.eq(all)]), Profil...                     0           0           0.012     0.00
  HasStep([shareToType.eq(all)])                                       0           0           0.000
  EdgeVertexStep(OUT)                                                  0           0           0.000
  HasStep([tenantId.eq(ndgThunderDome)])                               0           0           0.000
  TitanVertexStep(OUT,[has_access_to],edge)                            0           0           0.000
  HasStep([shareToType.eq(group)])                                     0           0           0.000
  EdgeVertexStep(OUT)                                                  0           0           0.000
  HasStep([~label.eq(Group)])                                          0           0           0.000
  TitanVertexStep(IN,[is_member_of, is_admin_of...                     0           0           0.000
  HasStep([userId.eq(a257c260-261f-45df-a1e7-92...                     0           0           0.000
  TitanVertexStep(OUT,[is_member_of, is_admin_o...                     0           0           0.000
  TitanVertexStep(OUT,[has_access_to],edge)                            0           0           0.000
  HasStep([shareToType.eq(user)])                                      0           0           0.000
  EdgeVertexStep(OUT)                                                  0           0           0.000
  HasStep([~label.eq(User)])                                           0           0           0.000
  HasStep([userId.eq(a257c260-261f-45df-a1e7-92...                     0           0           0.000
  TitanVertexStep(OUT,[has_access_to],edge)                            0           0           0.000
EdgeVertexStep(IN)                                                     0           0         190.550    24.91
SelectStep([role, path],[value(role), value(pat...                     0           0           0.021     0.00
SideEffectCapStep([~metrics])                                          1           1         193.286    25.27
                                            >TOTAL                     -           -         764.940        -

TIA

Upvotes: 0

Views: 518

Answers (1)

Vijay
Vijay

Reputation: 597

The script was not the issue. The Titan Db got overloaded with requests and the performance degraded with scripts timing out. Changing dynamodb.properties to add

cache.db-cache=true
cache.db-cache-time=...
cache.db-cache-size=0.3
cache.db-cache-clean-wait=50

Adding cache helped reduce the load on the Db and helped increase the requests/sec flowing through.

Changed gremlin-server.yaml too: threadPoolWorker =2 Not sure how to change threadPoolWorker based on the CPU cores though on our m4.large AWS instance with 2 CPU cores. Also changed by playing around with the values: maxAccumulationBufferComponents:8192 resultIterationBatchSize:2048

Upvotes: 0

Related Questions