Thomas M. DuBuisson
Thomas M. DuBuisson

Reputation: 64740

Titan index update takes too long

Even on an empty database, creating an index in Titan 1.0 takes several minutes. The time seems exact, which suggests there is an unnecessary delay.

My question is this: How to I shorten or eliminate the amount of time Titan takes to reindex? Conceptually, since no work is being done the time should be minimal, certainly not four minutes.

(N.B. I have previously been pointed to a solution that simply makes Titan wait the full delay without timing out. This is the wrong solution - I want to eliminate the delay entirely.)

The code I'm using to setup the database from scratch is:

graph = ... a local cassandra instance ...
graph.tx().rollback()

// 1. Check if the index already exists
mgmt = graph.openManagement()
i = mgmt.getGraphIndex('byIdent')
if(! i) {
  // 1a. If the index does not exist, add it
  idKey = mgmt.getPropertyKey('ident')
  idKey = idKey ? idKey : mgmt.makePropertyKey('ident').dataType(String.class).make()
  mgmt.buildIndex('byIdent', Vertex.class).addKey(idKey).buildCompositeIndex()
  mgmt.commit()
  graph.tx().commit()

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // 1b. Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  }
  // 1c. Now reindex, even though the DB is usually empty.
  mgmt.updateIndex(mgmt.getGraphIndex('byIdent'), SchemaAction.REINDEX).get()
  mgmt.commit()
  mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.ENABLED).call()
} else { mgmt.commit() }

It appears to be the updateIndex...REINDEX call that blocks till timeout. Is this a known problem or worksformewon'tfix? Am I doing something wrong?

EDIT: Disabling the REINDEX, as discussed in comments is actually not a fix because the index does not seem to become active. I now see:

WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [(myindexedkey = somevalue)]. For better performance, use indexes

Upvotes: 3

Views: 347

Answers (1)

Thomas M. DuBuisson
Thomas M. DuBuisson

Reputation: 64740

The time delay is/was entirely unnecessary and due to my misuse of Titan (though the pattern does appear in Titan 1.0.0 documentation chapter 28).

Do not block in a transaction!

Instead of:

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // 1b. Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  }

Consider:

  mgmt  = graph.openManagement()
  idKey = mgmt.getPropertyKey('ident')
  idx   = mgmt.getGraphIndex('byIdent')
  // Wait for index availability
  if ( idx.getIndexStatus(idKey).equals(SchemaStatus.INSTALLED) ) {
    mgmt.commit()
    mgmt.awaitGraphIndexStatus(graph, 'byIdent').status(SchemaStatus.REGISTERED).call()
  } else { mgmt.commit() }

Use ENABLE_INDEX

Not: mgmt.updateIndex(mgmt.getGraphIndex('byIdent'), SchemaAction.REINDEX).get()

Rather: mgmt.updateIndex(mgmt.getGraphIndex('byIdent'),SchemaAction.ENABLE_INDEX).get()

Upvotes: 3

Related Questions