Reputation: 90
I am using Titan Database (version 1.0.0) with Cassandra backend storage. My database is very big (millions of vertices and edges). I am using elasticsearch for indexing. It does very good job and I'm relatively easily and quickly receives thousands (~40000) of vertices as answer of my queries. But i have performance issue then I try to iterate over thous vertices and retrieve basic data saved on vertex properties. It take me about almost 1 min!!!
Usage of parallel streams of Java 8 significantly increase the performance but not enough (10 sec instead of 1 min).
Considered that i have thousand vertices with location property and time stamp. I want to retrieve only vertices with location (Geoshape) within queried area and collect the distinct time stamps.
This is part of my java code using Java 8 parallel streams:
TitanTransaction tt = titanWraper.getNewTransaction();
PropertyKey timestampKey = tt.getPropertyKey(TIME_STAMP);
TitanGraphQuery graphQuery = tt.query().has(LOCATION, Geo.WITHIN, cLocation);
Spliterator<TitanVertex> locationsSpl = graphQuery.vertices().spliterator();
Set<String> locationTimestamps = StreamSupport.stream(locationsSpl, true)
.map(locVertex -> {//map location vertices to timestamp String
String timestamp = locVertex.valueOrNull(timestampKey);
//this iteration takes about 10 sec to iterate over 40000 vertices
return timestamp;
})
.distinct()
.collect(Collectors.toSet());
Same code using standard java iteration:
TitanTransaction tt = titanWraper.getNewTransaction();
PropertyKey timestampKey = tt.getPropertyKey(TIME_STAMP);
TitanGraphQuery graphQuery = tt.query().has(LOCATION, Geo.WITHIN, cLocation);
Set<String> locationTimestamps = new HashSet<>();
for(TitanVertex locVertex : (Iterable<TitanVertex>) graphQuery.vertices()) {
String timestamp = locVertex.valueOrNull(timestampKey);
locationTimestamps.add(timestamp);
//this iteration takes about 45 sec to iterate over 40000 vertices
}
This performance is very disappoint me. Even worse if the result will be around 1 million vertices. I try to understand what is the reason of this issue. I am expecting that this should take me less the 1 sec to iterate over thous vertices.
Upvotes: 0
Views: 149
Reputation: 90
Same query but using gremlin traversal instead of graph query has much better performance and much shorter code:
TitanTransaction tt = graph.newTransaction();
Set<String> locationTimestamps = tt.traversal().V().has(LOCATION, P.within(cLocation))
.dedup(TIME_STAMP)
.values(TIME_STAMP)
.toSet();
Upvotes: 0