Titan Database: performance issue to iterate over thousands vertices in java code

Question

I am using Titan Database (version 1.0.0) with Cassandra backend storage. My database is very big (millions of vertices and edges). I am using elasticsearch for indexing. It does very good job and I'm relatively easily and quickly receives thousands (~40000) of vertices as answer of my queries. But i have performance issue then I try to iterate over thous vertices and retrieve basic data saved on vertex properties. It take me about almost 1 min!!!

Usage of parallel streams of Java 8 significantly increase the performance but not enough (10 sec instead of 1 min).

Considered that i have thousand vertices with location property and time stamp. I want to retrieve only vertices with location (Geoshape) within queried area and collect the distinct time stamps.

This is part of my java code using Java 8 parallel streams:

TitanTransaction tt = titanWraper.getNewTransaction();
PropertyKey timestampKey = tt.getPropertyKey(TIME_STAMP);
TitanGraphQuery graphQuery = tt.query().has(LOCATION, Geo.WITHIN, cLocation);
Spliterator locationsSpl = graphQuery.vertices().spliterator();

Set locationTimestamps = StreamSupport.stream(locationsSpl, true)
        .map(locVertex -> {//map location vertices to timestamp String
            String timestamp = locVertex.valueOrNull(timestampKey);

            //this iteration takes about 10 sec to iterate over 40000 vertices
            return timestamp;
         })
         .distinct()
         .collect(Collectors.toSet());

Same code using standard java iteration:

TitanTransaction tt = titanWraper.getNewTransaction();
PropertyKey timestampKey = tt.getPropertyKey(TIME_STAMP);
TitanGraphQuery graphQuery = tt.query().has(LOCATION, Geo.WITHIN, cLocation);
Set locationTimestamps = new HashSet<>();
for(TitanVertex locVertex : (Iterable) graphQuery.vertices()) {
    String timestamp = locVertex.valueOrNull(timestampKey);
    locationTimestamps.add(timestamp);        
    //this iteration takes about 45 sec to iterate over 40000 vertices            
}

This performance is very disappoint me. Even worse if the result will be around 1 million vertices. I try to understand what is the reason of this issue. I am expecting that this should take me less the 1 sec to iterate over thous vertices.

OctopusSD · Accepted Answer

Same query but using gremlin traversal instead of graph query has much better performance and much shorter code:

TitanTransaction tt = graph.newTransaction();
Set locationTimestamps = tt.traversal().V().has(LOCATION, P.within(cLocation))
    .dedup(TIME_STAMP)
    .values(TIME_STAMP)
    .toSet();

Titan Database: performance issue to iterate over thousands vertices in java code

Answers (1)

Related Questions