John Stephenson
John Stephenson

Reputation: 501

Titan/DynamoDb doesn't release all acquired locks on commit (via gremlin)

Ok, I realise this sounds unlikely and I'm prepared to be shot down on this one but here goes...

I have a gremlin server running against titanDB and dynamoDB (local). I'm running some unit tests that keep failing due to

tx 0x705eafda280e already locked key-column (  8-  0-  0-  0-  0-  0-  0-128, 80-160) when tx 0x70629e1d56bf tried to lock

I'm running the following commands in the gremlin client console against a clean, completely empty DB (recreated between test runs using docker images). The aim of this work is to support database upgrade scripts. The original actual steps where more complete than the below but this is the minimum to reproduce the issue.

(Connect to local 'remote')
:remote connect tinkerpop.server conf/remote.yaml

(Add a unique constraint on a 'databaseMetadata' label which has a single 'version' property)
:> mgmt = graph.openManagement();if (!mgmt.getGraphIndex("bydatabaseMetadataversion")) {graph.tx().rollback();int size = graph.getOpenTransactions().size();for (i = 0; i < size; i++) { try { graph.getOpenTransactions().getAt(0).rollback();} catch(Throwable ex) { }; }; mgmt = graph.openManagement();propertyKey = (!mgmt.containsPropertyKey("version")) ? mgmt.makePropertyKey("version").dataType(String.class).cardinality(Cardinality.SINGLE).make():mgmt.getPropertyKey("version");labelObj = (!mgmt.containsVertexLabel("databaseMetadata")) ? mgmt.makeVertexLabel("databaseMetadata").make():mgmt.getVertexLabel("databaseMetadata");index = mgmt.buildIndex("bydatabaseMetadataversion", Vertex.class).addKey(propertyKey).unique().indexOnly(labelObj).buildCompositeIndex();mgmt.setConsistency(propertyKey, ConsistencyModifier.LOCK);mgmt.setConsistency(index, ConsistencyModifier.LOCK);mgmt.commit();mgmt = graph.openManagement();index = mgmt.getGraphIndex("bydatabaseMetadataversion");propertyKey = mgmt.getPropertyKey("version");if (index.getIndexStatus(propertyKey) == SchemaStatus.INSTALLED) {mgmt.awaitGraphIndexStatus(graph, "bydatabaseMetadataversion").status(SchemaStatus.REGISTERED).timeout(10, java.time.temporal.ChronoUnit.MINUTES).call();}; mgmt.commit();mgmt = graph.openManagement();index = mgmt.getGraphIndex("bydatabaseMetadataversion");propertyKey = mgmt.getPropertyKey("version");if (index.getIndexStatus(propertyKey) != SchemaStatus.ENABLED) {mgmt.commit();mgmt = graph.openManagement();mgmt.updateIndex(mgmt.getGraphIndex("bydatabaseMetadataversion"), SchemaAction.ENABLE_INDEX).get();mgmt.commit();mgmt = graph.openManagement();mgmt.awaitGraphIndexStatus(graph, "bydatabaseMetadataversion").status(SchemaStatus.ENABLED).timeout(10, java.time.temporal.ChronoUnit.MINUTES).call();}; mgmt.commit();} else {index = mgmt.getGraphIndex("bydatabaseMetadataversion");propertyKey = mgmt.getPropertyKey("version");if (index.getIndexStatus(propertyKey) != SchemaStatus.ENABLED) {mgmt.awaitGraphIndexStatus(graph, "bydatabaseMetadataversion").status(SchemaStatus.ENABLED).timeout(10, java.time.temporal.ChronoUnit.MINUTES).call();}; mgmt.commit();};

(Add the metadata vertex with initial version '0.0.1')
:> graph.addVertex(label, "databaseMetadata").property("version", "0.0.1");graph.tx().commit();

(Update the metadata vertex with the next version - 0.0.2)
:> g.V().hasLabel("databaseMetadata").has("version", "0.0.1").property("version", "0.0.2").next();g.tx().commit();

(THIS FAILS - Update the metadata vertex with the next version - 0.0.3)
:> g.V().hasLabel("databaseMetadata").has("version", "0.0.2").property("version", "0.0.3").next();g.tx().commit();
tx 0x705eafda280e already locked key-column (  8-  0-  0-  0-  0-  0-  0-128, 80-160) when tx 0x70629e1d56bf tried to lock

Previously I had looked through the titan-dynamodb source and I saw that the commits/rollbacks etc of the transactions are logged, so I had changed the log level to get further information (full log file available).

When the 0.0.1 -> 0.0.2 update was executed the following locks were acquired:

[33mtitan_server_1  |[0m 120479 [gremlin-server-exec-3] TRACE com.amazon.titan.diskstorage.dynamodb.AbstractDynamoDBStore  - acquiring lock on (  8-  0-  0-  0-  0-  0-  0-128, 80-160) at 123552624951495
[33mtitan_server_1  |[0m 120489 [gremlin-server-exec-3] TRACE com.amazon.titan.diskstorage.dynamodb.AbstractDynamoDBStore  - acquiring lock on (  6-137-160- 48- 46- 48- 46-177,  0) at 123552635424334
[33mtitan_server_1  |[0m 120489 [gremlin-server-exec-3] TRACE com.amazon.titan.diskstorage.dynamodb.AbstractDynamoDBStore  - acquiring lock on (  6-137-160- 48- 46- 48- 46-178,  0) at 123552635704705

When that transaction was commited only TWO locks where released.

[33mtitan_server_1  |[0m 120722 [gremlin-server-exec-3] DEBUG com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreTransaction  - commit id:0x705eafda280e
[33mtitan_server_1  |[0m 120722 [gremlin-server-exec-3] TRACE com.amazon.titan.diskstorage.dynamodb.AbstractDynamoDBStore  - Expiring (  6-137-160- 48- 46- 48- 46-177,  0) in tx 0x705eafda280e because of EXPLICIT
[33mtitan_server_1  |[0m 120722 [gremlin-server-exec-3] TRACE com.amazon.titan.diskstorage.dynamodb.AbstractDynamoDBStore  - Expiring (  6-137-160- 48- 46- 48- 46-178,  0) in tx 0x705eafda280e because of EXPLICIT
[33mtitan_server_1  |[0m 120722 [gremlin-server-exec-3] DEBUG org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor  - Preparing to iterate results from - RequestMessage{, requestId=09f27811-dcc3-4e53-a749-22828d34997f, op='eval', processor='', args={gremlin=g.V().hasLabel("databaseMetadata").has("version", "0.0.1").property("version", "0.0.2").next();g.tx().commit();, batchSize=64}} - in thread [gremlin-server-exec-3]

The remaining lock ends up expiring after a few minutes, but in the mean time every other update fails as reported.

So, why does that lock not get removed? I suspect it's related to the unique index that's created, so I've either setup the index wrong (a good possibility) or this is a bug.

For ease of consumption, the (slightly shortened) index setup is below:

mgmt = graph.openManagement()
propertyKey = (!mgmt.containsPropertyKey("version")) ? mgmt.makePropertyKey("version").dataType(String.class).cardinality(Cardinality.SINGLE).make():mgmt.getPropertyKey("version")
labelObj = (!mgmt.containsVertexLabel("databaseMetadata")) ? mgmt.makeVertexLabel("databaseMetadata").make():mgmt.getVertexLabel("databaseMetadata")
index = mgmt.buildIndex("bydatabaseMetadataversion", Vertex.class).addKey(propertyKey).unique().indexOnly(labelObj).buildCompositeIndex()
mgmt.setConsistency(propertyKey, ConsistencyModifier.LOCK)
mgmt.setConsistency(index, ConsistencyModifier.LOCK)
mgmt.commit()
mgmt = graph.openManagement()
index = mgmt.getGraphIndex("bydatabaseMetadataversion")
propertyKey = mgmt.getPropertyKey("version")
if (index.getIndexStatus(propertyKey) == SchemaStatus.INSTALLED) {
  mgmt.awaitGraphIndexStatus(graph, "bydatabaseMetadataversion").status(SchemaStatus.REGISTERED).timeout(10, java.time.temporal.ChronoUnit.MINUTES).call()
}
mgmt.commit()
mgmt = graph.openManagement()
index = mgmt.getGraphIndex("bydatabaseMetadataversion")
propertyKey = mgmt.getPropertyKey("version")
if (index.getIndexStatus(propertyKey) != SchemaStatus.ENABLED) {
  mgmt.commit()
  mgmt = graph.openManagement()
  mgmt.updateIndex(mgmt.getGraphIndex("bydatabaseMetadataversion"), SchemaAction.ENABLE_INDEX).get()
  mgmt.commit()
  mgmt = graph.openManagement()
  mgmt.awaitGraphIndexStatus(graph, "bydatabaseMetadataversion").status(SchemaStatus.ENABLED).timeout(10, java.time.temporal.ChronoUnit.MINUTES).call()
}
mgmt.commit()

I know this is a LONG issue description, but any help would be gratefully received!

(I should also say that I tried this against the local and cloud based dynamoDb instances and had the same issue on both, so came back to the local and turned on the logging.)

I'm using titan 1.0.0 and tinkerpop 3 as set in dynamo-titan on github.

Upvotes: 1

Views: 441

Answers (2)

Alexander Patrikalakis
Alexander Patrikalakis

Reputation: 5195

I got repro and and found your issue. Basically, the LRU cache pulls its expiry time from storage.lock.expiry-time config. Default is 5 minutes, so if you try to make changes before 5 minutes pass, yes, the AbstractDynamoDBStore.keyColumnLocalLocks LRU cache will not let you make the second change. By reducing expiry-time and Thread.sleep() before making the second change, you allow the second change to claim the lock again and succeed.

//default lock expiry time is 300*1000 ms = 5 minutes. Set to 100ms.
config.setProperty("storage.lock.expiry-time", 100);

Upvotes: 2

Double M
Double M

Reputation: 1503

F.y.i. I have run all of your above code in Java using a Berkeley storage backend.

TitanGraph graph = ...;
TitanManagement mgmt = graph.openManagement();
PropertyKey propertyKey = (!mgmt.containsPropertyKey("version"))
        ? mgmt.makePropertyKey("version").dataType(String.class).cardinality(Cardinality.SINGLE).make()
        : mgmt.getPropertyKey("version");
VertexLabel labelObj = (!mgmt.containsVertexLabel("databaseMetadata"))
        ? mgmt.makeVertexLabel("databaseMetadata").make() 
        : mgmt.getVertexLabel("databaseMetadata");
TitanGraphIndex index = mgmt.buildIndex("bydatabaseMetadataversion", Vertex.class).addKey(propertyKey).unique()
        .indexOnly(labelObj).buildCompositeIndex();
mgmt.setConsistency(propertyKey, ConsistencyModifier.LOCK);
mgmt.setConsistency(index, ConsistencyModifier.LOCK);
mgmt.commit();
mgmt = graph.openManagement();
index = mgmt.getGraphIndex("bydatabaseMetadataversion");
propertyKey = mgmt.getPropertyKey("version");
if (index.getIndexStatus(propertyKey) == SchemaStatus.INSTALLED) {
    try {
        ManagementSystem.awaitGraphIndexStatus(graph,"bydatabaseMetadataversion").status(SchemaStatus.REGISTERED).timeout(10, java.time.temporal.ChronoUnit.MINUTES).call();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
}
mgmt.commit();
mgmt = graph.openManagement();
index = mgmt.getGraphIndex("bydatabaseMetadataversion");
propertyKey = mgmt.getPropertyKey("version");
if (index.getIndexStatus(propertyKey) != SchemaStatus.ENABLED) {
    mgmt.commit();
    mgmt = graph.openManagement();
    try {
        mgmt.updateIndex(mgmt.getGraphIndex("bydatabaseMetadataversion"), SchemaAction.ENABLE_INDEX).get();
    } catch (InterruptedException | ExecutionException e) {
        e.printStackTrace();
    }
    mgmt.commit();
    mgmt = graph.openManagement();
    try {
        ManagementSystem.awaitGraphIndexStatus(graph, "bydatabaseMetadataversion").status(SchemaStatus.ENABLED)
                        .timeout(10, java.time.temporal.ChronoUnit.MINUTES).call();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
}
mgmt.commit();

Then the operations on the graph;

GraphTraversalSource g = graph.traversal();

graph.addVertex("databaseMetadata").property("version", "0.0.1");
graph.tx().commit();

g.V().hasLabel("databaseMetadata").has("version", "0.0.1").property("version", "0.0.2").iterate();
g.tx().commit();

g.V().hasLabel("databaseMetadata").has("version", "0.0.1").property("version", "0.0.2").iterate();
g.tx().commit();

g.V().hasLabel("databaseMetadata").has("version", "0.0.2").property("version", "0.0.3").iterate();
g.tx().commit();

g.V().hasLabel("databaseMetadata").has("version").properties("version").forEachRemaining(prop -> {
    System.out.println("Version: " + prop.value());
});

The result was:

Version: 0.0.3

Sadly, the iterate() alteration of your query only applies to Java. Your scripts should work as they are. Because of the result of my experiment I strongly suspect that the DynamoDB backend is causing trouble.

Upvotes: 2

Related Questions