Reputation: 83
I have 3 node in datastax enterprise and loaded 65 million vertices and edges on these. when i use dse studio or gremlin console and run gremlin query on my graph the query is too slow. I defined any kind of index and test again but Had no effect. when i run query for example "g.v().count()" cpu usage and cpu load average no much change while if i run cql query, that distribute on all nodes and cpu usage and cpu load average on all nodes are a significant change what is best practice or best configurations for efficient gremlin query in this case?
Upvotes: 2
Views: 913
Reputation: 83
After many tests on different queries I came to this conclusion that it seems the gremlin has a problem with count query on million vertices while when you define a index on property of vertices and find specific vertix for example: g.V().hasLabel('member').has('C_ID','4242833')
the time of this query lower than 1 second and this is acceptable. question is here why gremlin has problem with count query on million vertices?
Upvotes: 1
Reputation: 46206
count()
based traversals should be executed via OLAP with Spark for graphs of the size you are working with. If you using standard OLTP based traversals you can expect long wait times for this type of query.
Note that this rule holds true for any graph computation that must do a "table scan" (i.e. touch all or a very large portion of vertices/edges in the graph). This issue is not specific to DSE Graph either and will apply to virtually any graph database.
Upvotes: 5