Reputation: 64740
Even on an empty database, creating an index in Titan 1.0 takes several minutes. The time seems exact, which suggests there is an unnecessary delay.
My question is this: How to I shorten or eliminate the amount of time Titan takes to reindex? Conceptually, since no work is being done the time should be minimal, certainly not four minutes.
(N.B. I have previously been pointed to a solution that simply makes Titan wait the full delay without timing out. This is the wrong solution - I want to eliminate the delay entirely.)
The code I'm using to setup the database from scratch is:
graph = ... a local cassandra instance ...
graph.tx().rollback()
// 1. Check if the index already exists
mgmt = graph.openManagement()
i = mgmt.getGraphIndex('byIdent')
if(! i) {
// 1a. If the index does not exist, add it
idKey = mgmt.getPropertyKey('ident')
idKey = idKey ? idKey : mgmt.makePropertyKey('ident').dataType(String.class).make()
mgmt.buildIndex('byIdent', Vertex.class).addKey(idKey).buildCompositeIndex()
mgmt.commit()
graph.tx().commit()
mgmt = graph.openManagement()
idKey = mgmt.getPropertyKey('ident')
idx = mgmt.getGraphIndex('byIdent')
// 1b. Wait for index availability
if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
}
// 1c. Now reindex, even though the DB is usually empty.
mgmt.updateIndex(mgmt.getGraphIndex('byIdent'), SchemaAction.REINDEX).get()
mgmt.commit()
mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.ENABLED).call()
} else { mgmt.commit() }
It appears to be the updateIndex...REINDEX
call that blocks till timeout. Is this a known problem or worksformewon'tfix? Am I doing something wrong?
EDIT: Disabling the REINDEX, as discussed in comments is actually not a fix because the index does not seem to become active. I now see:
WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [(myindexedkey = somevalue)]. For better performance, use indexes
Upvotes: 3
Views: 347
Reputation: 64740
The time delay is/was entirely unnecessary and due to my misuse of Titan (though the pattern does appear in Titan 1.0.0 documentation chapter 28).
Do not block in a transaction!
Instead of:
mgmt = graph.openManagement()
idKey = mgmt.getPropertyKey('ident')
idx = mgmt.getGraphIndex('byIdent')
// 1b. Wait for index availability
if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
}
Consider:
mgmt = graph.openManagement()
idKey = mgmt.getPropertyKey('ident')
idx = mgmt.getGraphIndex('byIdent')
// Wait for index availability
if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
mgmt.commit()
mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
} else { mgmt.commit() }
Use ENABLE_INDEX
Not: mgmt.updateIndex(mgmt.getGraphIndex('byIdent'), SchemaAction.REINDEX).get()
Rather: mgmt.updateIndex(mgmt.getGraphIndex('byIdent'),SchemaAction.ENABLE_INDEX).get()
Upvotes: 3