Reputation: 21
Have run into an issue with using plain old TinkerGraph to drop a moderate sized number of vertices. In total, there are about 1250 vertices and 2500 edges that will be dropped.
When running the following:
g.V(ids).drop().iterate()
It takes around 20-30 seconds. This seems ridiculous and I have seemingly verified that it is not caused by anything other than the removal of the nodes.
I'm hoping there is some key piece that I am missing or an area I have yet to explore that will help me out here.
The environment is not memory or CPU constrained in any way. I've profiled the code and see the majority of the time spent is in the TinkerVertex.remove
method. This is doubly strange because the creation of these nodes takes less than a second.
I've been able to optimize this a bit by doing a batching and separate threads solution like this one: Improve performance removing TinkerGraph vertices vertices
However, 10-15 seconds is still too long as I'm hoping to have this be a synchronous operation.
I've considered following something like this but that feels like overkill for dropping less than 5k elements...
To note, the size of the graph is around 110k vertices and 150k edges.
I've tried to profile the gremlin query but it seems that you can't profile through the JVM using:
g.V(ids).drop().iterate().profile()
I've tried various ways of writing the query for profiling but was unable to get it to work.
I'm hoping there is just something I'm missing that will help get this resolved.
Upvotes: 2
Views: 330
Reputation: 14371
As mentioned in comments, it definitely seems unusual that this operation is taking so long, unless the machine being used is very busy performing other tasks. Using my laptop (16GB RAM, modest CPU and other specs) I can drop the air-routes graph (3,747 nodes and 57,660 edges) in milliseconds time from the Gremlin console.
gremlin> Gremlin.version
==>3.6.0
gremlin> g
==>graphtraversalsource[tinkergraph[vertices:3747 edges:57660], standard]
gremlin> g.V().drop().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[]) 3747 3747 6.226 7.52
DropStep 76.587 92.48
>TOTAL - - 82.813 -
gremlin> g
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
I also tried dropping a list of 1000 nodes as follows but still experienced millisecond time.
gremlin> g
tinkergraph[vertices:3747 edges:57660]
gremlin> a=[] ; for (x in (1..1000)) {a << x}
==>null
gremlin> a.size()
==>1000
gremlin> g.V(a).drop().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[1, 2, 3, 4, 5, 6, 7, 8,... 1000 1000 2.677 13.87
DropStep 16.626 86.13
>TOTAL - - 19.304 -
gremlin> g
==>graphtraversalsource[tinkergraph[vertices:2747 edges:9331], standard]
Perhaps see if you can get a profile
from your Java code using a query without iterate
(it's not needed as profile
is a terminal step). Also check for any unusual GC activity. I would also see if you see this same issue using the Gremlin Console. Something is definitely odd here. If none of these investigations bear fruit perhaps update the question to show the exact Java code you are using.
Upvotes: 1