TITAN : Identify and remove duplicate vertices in graph

Question

I am using TITAN 0.4 over Cassandra, I have indexed my key ("ip_address" in my case), but as NON-UNIQUE, for performance and scalability. Now the challenge is graph allows duplicates vertices. I am running a background task to cleanup the duplicate vertices in graph, by iterating through all vertices. What is the best way or approach to identify a duplicate vertex in a graph. The the estimated size of graph in production is around 10M ~ 15M vertices or even more than that. Is there any feature exist in TITAN index, which helps to easily identify a duplicate? Thanks in advance

Index creation Gremlin script

g.makeKey("ip_address").dataType(String.class).indexed("standard",Vertex.class).make();

TITAN : Identify and remove duplicate vertices in graph

Answers (1)

Related Questions