jarandaf
jarandaf

Reputation: 4427

Titan index issues with Cassandra storage backend

I am populating a Titan 1.0.0 single instance with a moderate graph in order to test its query performance. I am using Cassandra 2.0.17 as storage backend.

The thing is I am not able to create node indexes, and hence query results optimally. I have read the docs and I am trying to follow them carefully without much success. I am using the following groovy script for the schema definition, data population and index creation:

import com.thinkaurelius.titan.core.*;
import com.thinkaurelius.titan.core.schema.*;
import com.thinkaurelius.titan.graphdb.database.management.ManagementSystem;
import java.time.temporal.ChronoUnit;

graph = TitanFactory.open('conf/my-titan.properties');
mgmt = graph.openManagement();

// Build graph schema
//        Node properties
idProp = mgmt.containsPropertyKey('userId') ?
  mgmt.getPropertyKey('userId') : mgmt.makePropertyKey('id').dataType(String.class).cardinality(Cardinality.SINGLE);
isPublicProp = mgmt.containsPropertyKey('isPublic') ?
  mgmt.getPropertyKey('isPublic') : mgmt.makePropertyKey('isPublic').dataType(Boolean.class).cardinality(Cardinality.SINGLE);
completionPercentageProp = mgmt.containsPropertyKey('completionPercentage') ?
  mgmt.getPropertyKey('completionPercentage') : mgmt.makePropertyKey('completionPercentage').dataType(Integer.class).cardinality(Cardinality.SINGLE);
genderProp = mgmt.containsPropertyKey('gender') ?
 mgmt.getPropertyKey('gender') : mgmt.makePropertyKey('gender').dataType(String.class).cardinality(Cardinality.SINGLE);
regionProp = mgmt.containsPropertyKey('region') ?
 mgmt.getPropertyKey('region') : mgmt.makePropertyKey('region').dataType(String.class).cardinality(Cardinality.SINGLE);
lastLoginProp = mgmt.containsPropertyKey('lastLogin') ?
 mgmt.getPropertyKey('lastLogin') : mgmt.makePropertyKey('lastLogin').dataType(String.class).cardinality(Cardinality.SINGLE);
registrationProp = mgmt.containsPropertyKey('registration') ?
 mgmt.getPropertyKey('registration') : mgmt.makePropertyKey('registration').dataType(String.class).cardinality(Cardinality.SINGLE);
ageProp = mgmt.containsPropertyKey('age') ?  mgmt.getPropertyKey('age') : mgmt.makePropertyKey('age').dataType(Integer.class).cardinality(Cardinality.SINGLE);
mgmt.commit();

nUsers = 0
println 'Starting nodes population...';
// Load users
new File('/home/jarandaf/soc-pokec-profiles.txt').eachLine {
  try {
    fields = it.split('\t').take(8);
    userId = fields[0];
    isPublic = fields[1] == '1' ? true : false;
    completionPercentage = fields[2]
    gender = fields[3] == '1' ? 'male' : 'female';
    region = fields[4];
    lastLogin = fields[5];
    registration = fields[6];
    age = fields[7] as int;
    graph.addVertex('userId', userId, 'isPublic', isPublic, 'completionPercentage', completionPercentage, 'gender', gender, 'region', region, 'lastLogin', lastLogin, 'registration', registration, 'age', age);
  } catch (Exception e) {
    // Silently skip...
  }
  nUsers += 1
  if (nUsers % 100000 == 0) println String.valueOf(nUsers) + ' loaded...';
};
graph.tx().commit();
println 'Nodes population finished';

// Index users by userId, gender and age
println 'Getting node properties...';
mgmt = graph.openManagement();
userId = mgmt.getPropertyKey('userId');
gender = mgmt.getPropertyKey('gender');
age = mgmt.getPropertyKey('age');

println 'Building byUserId index...';
if (mgmt.getGraphIndex('byUserId') == null) mgmt.buildIndex('byUserId', Vertex.class).addKey(userId).buildCompositeIndex();
println 'Building byGender index...';
if (mgmt.getGraphIndex('byGender') == null) mgmt.buildIndex('byGender', Vertex.class).addKey(gender).buildCompositeIndex();
println 'Building byAge index...';
if (mgmt.getGraphIndex('byAge') == null) mgmt.buildIndex('byAge', Vertex.class).addKey(age).buildCompositeIndex();
mgmt.commit();

// Wait for the indexes to become available
println 'Awaiting byUserId graph index status...';
ManagementSystem.awaitGraphIndexStatus(graph, 'byUserId')
  .status(SchemaStatus.REGISTERED)
  .timeout(10, ChronoUnit.MINUTES)
  .call();
println 'Awaiting byGender graph index status...';
ManagementSystem.awaitGraphIndexStatus(graph, 'byGender')
  .status(SchemaStatus.REGISTERED)
  .timeout(10, ChronoUnit.MINUTES)
  .call();

println 'Awaiting byAge graph index status...';
ManagementSystem.awaitGraphIndexStatus(graph, 'byAge')
  .status(SchemaStatus.REGISTERED)
  .timeout(10, ChronoUnit.MINUTES)
  .call();

// Reindex the existing data
mgmt = graph.openManagement();
println 'Reindexing data by byUserId index...';
mgmt.updateIndex(mgmt.getGraphIndex('byUserId'), SchemaAction.REINDEX).get();
println 'Reindexing data by byGender index...';
mgmt.updateIndex(mgmt.getGraphIndex('byGender'), SchemaAction.REINDEX).get();
println 'Reindexing data by byAge index...';
mgmt.updateIndex(mgmt.getGraphIndex('byAge'), SchemaAction.REINDEX).get();
mgmt.commit();

// Enable indexes
println 'Enabling byUserId index...'
mgmt.awaitGraphIndexStatus(graph, 'byUserId').status(SchemaStatus.ENABLED).call();
println 'Enabling byGender index...'
mgmt.awaitGraphIndexStatus(graph, 'byGender').status(SchemaStatus.ENABLED).call();
println 'Enabling byAge index...'
mgmt.awaitGraphIndexStatus(graph, 'byAge').status(SchemaStatus.ENABLED).call();

graph.close();

The error I am getting is the following and is related with the reindex phase:

08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger  - Evicted [2@0ac717511509-mybox] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]]
08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger  - Evicted [3@0ac717511509-mybox] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]]
08:24:26 ERROR com.thinkaurelius.titan.graphdb.database.management.ManagementLogger  - Evicted [4@0ac717511509-mybox] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardtitantx[0x4b8696a4], standardtitantx[0x2d39f30a], standardtitantx[0x0da9172d], standardtitantx[0x7c6c7909], standardtitantx[0x79dd0a38], standardtitantx[0x5999c49e], standardtitantx[0x5aaba4a7]]

Any hints on this would be much appreciated.

Upvotes: 1

Views: 154

Answers (1)

Florian Hockmann
Florian Hockmann

Reputation: 2809

The errors you get indicate that you have open transactions when you try to modify the schema. Titan needs to wait for all transactions to complete before it can modify the schema. See the answer from Matthias Broecheler on the mailing list for more information.

In general, you should avoid reindexing if possible as it requires Titan to walk over all vertices to see whether they need to be added to the index that should be updated. The documentation contains more information about this process.

For your use case, you can simply create all indexes before you load any data. When you then add the data after all indexes are ready, they will be simply added to the indexes. That way, you should be able to use the indexes immediately.

A minimal example for the schema creation in Groovy (but it should be basically the same in Java):

import com.thinkaurelius.titan.core.TitanFactory;
import com.thinkaurelius.titan.core.Multiplicity;
import com.thinkaurelius.titan.core.Cardinality;

graph = TitanFactory.open('conf/my-titan.properties')

mgmt = graph.openManagement()

id = mgmt.makePropertyKey('id').dataType(String.class).cardinality(Cardinality.SINGLE)

// some other properties that will not be indexed
mgmt.makePropertyKey('isPublic').dataType(Boolean.class).cardinality(Cardinality.SINGLE)
mgmt.makePropertyKey('completionPercentage').dataType(Integer.class).cardinality(Cardinality.SINGLE)

// I prefer to use vertex labels to differentiate between different 'types' of vertices but this isn't necessary
User = mgmt.makeVertexLabel('User').make()

mgmt.buildIndex('UserById',Vertex.class).addKey(id).indexOnly(user).buildCompositeIndex()

mgmt.commit()

I removed all the checks for already existing schema elements for simplicity, but you can of course add them again. After the schema creation, you can add your data just like before.

A final node about index management: Try to always define the property keys that you want to index in the same transaction in which you create the index. Otherwise, Titan cannot know whether there is already data that needs to be added to the new index which requires again a complete scan of all data. This might require to choose a different name for a property. When you add for example a new vertex label post, then you might want to use a new name like postId instead of using the property id again to avoid the scan of all existing data.

Upvotes: 2

Related Questions