Amnesiac
Amnesiac

Reputation: 659

How to traverse & query faster TitanGraph with Cassandra as a backend?

I have millions of nodes stored in Titan 1.0.0 with Cassandra 2.2.4. I want to retrieve graph from Cassandra and query or traverse it in fast way.

If I build index in a code,

mgmt.buildIndex("nameSearchIndex", Vertex.class).addKey(namep, Mapping.TEXT.asParameter()).buildMixedIndex("search");
mgmt.buildIndex("addressSearchIndex", Vertex.class).addKey(addressp, Mapping.TEXT.asParameter()).buildMixedIndex("search");

Still the querying seems to be slower.

When I use

g.traversal().V().count() 

it still gives warning - please use indexes, when I have already build indexes in code. Is there any specific configuration to forcefully activate indexes? How to query graph with using indexes?

g.traversal().V().has("Name","Jason") 

Does this query uses indexes? if not then how do I make use of indexes to query faster?

Can Spark be used for fast traversal? How to use SparkComputerGraph for the same? I am not able to find the configurations for CassnadraInputFormat with Spark.

Thanks.

Upvotes: 1

Views: 354

Answers (2)

Jason Plurad
Jason Plurad

Reputation: 6782

This answer was cross-posted on the Titan mailing list.

Indexing is useful for doing fast traversals, but ultimately "fast queries" depends on many factors, including your graph model/volume/shape and the types of questions you are trying to answer.

Read Chapter 8 "Indexing for better Performance" in the Titan docs, and digest the differences between the different types: Composite, Mixed, and Vertex-centric.

Based on the example query you posted, and as Daniel noted, it looks to me like an exact match type of query, so I would start with a Composite index. You can cut and paste this to try it out in the Titan Console.

graph = TitanFactory.open('inmemory')
mgmt = graph.openManagement()
name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
nameIndex = mgmt.buildIndex('nameIndex',Vertex.class).addKey(name).buildCompositeIndex()
mgmt.commit()
graph.addVertex('name','jason')
g = graph.traversal()
g.V().has('name','jason') // no warning should appear

If after reading the Composite vs Mixed Index section you decide that a Mixed index (backed by Elasticsearch, Solr, or Lucene) is what you really need, read Chapter 20 "Index Parameters and Full-Text Search", and digest the differences between the mappings TEXT, STRING, and TEXTSTRING.

Here's an example that uses a STRING mixed index

graph = TitanFactory.build().set('storage.backend','inmemory').set('index.search.backend','elasticsearch').open()
mgmt = graph.openManagement()
name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
nameIndex = mgmt.buildIndex('nameIndex',Vertex.class).addKey(name, Mapping.STRING.asParameter()).buildMixedIndex("search")
mgmt.commit()
graph.addVertex('name','jason')
g = graph.traversal()
g.V().has('name','jason') // no warning should appear

Upvotes: 1

Sebastian Good
Sebastian Good

Reputation: 6360

There are lots of questions bundled up in this question.

The indexes you're making are mixed indexes, which are implemented by an external indexing system, such as Solr or ElasticSearch. These would help if you are looking for a vertex with a certain name, such as your .has("Name", "Jason) example.

To find out if an index is being used, I suggest looking into the profile() step in Gremlin. You can read about it here.

Spark is meant to be used for traversals that need to potentially load a graph that is bigger than one machine can hold. What use case is .V().count() important for?

Upvotes: 1

Related Questions