Reputation: 659
I have millions of nodes stored in Titan 1.0.0 with Cassandra 2.2.4. I want to retrieve graph from Cassandra and query or traverse it in fast way.
If I build index in a code,
mgmt.buildIndex("nameSearchIndex", Vertex.class).addKey(namep, Mapping.TEXT.asParameter()).buildMixedIndex("search");
mgmt.buildIndex("addressSearchIndex", Vertex.class).addKey(addressp, Mapping.TEXT.asParameter()).buildMixedIndex("search");
Still the querying seems to be slower.
When I use
g.traversal().V().count()
it still gives warning - please use indexes, when I have already build indexes in code. Is there any specific configuration to forcefully activate indexes? How to query graph with using indexes?
g.traversal().V().has("Name","Jason")
Does this query uses indexes? if not then how do I make use of indexes to query faster?
Can Spark be used for fast traversal? How to use SparkComputerGraph for the same? I am not able to find the configurations for CassnadraInputFormat with Spark.
Thanks.
Upvotes: 1
Views: 354
Reputation: 6782
This answer was cross-posted on the Titan mailing list.
Indexing is useful for doing fast traversals, but ultimately "fast queries" depends on many factors, including your graph model/volume/shape and the types of questions you are trying to answer.
Read Chapter 8 "Indexing for better Performance" in the Titan docs, and digest the differences between the different types: Composite, Mixed, and Vertex-centric.
Based on the example query you posted, and as Daniel noted, it looks to me like an exact match type of query, so I would start with a Composite index. You can cut and paste this to try it out in the Titan Console.
graph = TitanFactory.open('inmemory')
mgmt = graph.openManagement()
name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
nameIndex = mgmt.buildIndex('nameIndex',Vertex.class).addKey(name).buildCompositeIndex()
mgmt.commit()
graph.addVertex('name','jason')
g = graph.traversal()
g.V().has('name','jason') // no warning should appear
If after reading the Composite vs Mixed Index section you decide that a Mixed index (backed by Elasticsearch, Solr, or Lucene) is what you really need, read Chapter 20 "Index Parameters and Full-Text Search", and digest the differences between the mappings TEXT, STRING, and TEXTSTRING.
Here's an example that uses a STRING mixed index
graph = TitanFactory.build().set('storage.backend','inmemory').set('index.search.backend','elasticsearch').open()
mgmt = graph.openManagement()
name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
nameIndex = mgmt.buildIndex('nameIndex',Vertex.class).addKey(name, Mapping.STRING.asParameter()).buildMixedIndex("search")
mgmt.commit()
graph.addVertex('name','jason')
g = graph.traversal()
g.V().has('name','jason') // no warning should appear
Upvotes: 1
Reputation: 6360
There are lots of questions bundled up in this question.
The indexes you're making are mixed indexes, which are implemented by an external indexing system, such as Solr or ElasticSearch. These would help if you are looking for a vertex with a certain name, such as your .has("Name", "Jason)
example.
To find out if an index is being used, I suggest looking into the profile()
step in Gremlin. You can read about it here.
Spark is meant to be used for traversals that need to potentially load a graph that is bigger than one machine can hold. What use case is .V().count()
important for?
Upvotes: 1