Horcrux7
Horcrux7

Reputation: 24447

What is the cause for the GoneException from the Cosmos DB with async requests?

I use the Cosmos Java driver 4.12.0 with the current Cosmos DB emulator. I need to write multiple documents in one step. My code works if I use the synchrone API. To improve the speed I try the asynchrone API and it is failing ever with a GoneException after some short time. What can be the cause of the GoneException?

{
   "ClassName":"GoneException",
   "userAgent":"azsdk-java-cosmos/4.12.0 Windows10/10.0 JRE/11.0.7",
   "statusCode":410,
   "resourceAddress":"rntbd://10.10.10.10:10253/apps/DocDbApp/services/DocDbServer15/partitions/a4cb495b-38c8-11e6-8106-8cdcd42c33be/replicas/1p/",
   "innerErrorMessage":"AsyncRntbdRequestRecord({\"args\":{\"transportRequestId\":102,\"activityId\":\"13ddd623-84e3-11eb-a42d-d5052d2cf638\",
        \"origin\":\"rntbd://10.10.10.10:10253\",
        \"replicaPath\":\"/apps/DocDbApp/services/DocDbServer15/partitions/a4cb495b-38c8-11e6-8106-8cdcd42c33be/replicas/1p\",
        \"timeCreated\":\"2021-03-14T16:33:59.072819500Z\",
        \"lifetime\":\"PT5.0601317S\"},
        \"requestLength\":333849,
        \"responseLength\":-1,
        \"status\":{\"done\":false,\"cancelled\":false,
        \"completedExceptionally\":false},
        \"timeline\":[
   {\"eventName\":\"created\",\"startTimeUTC\":\"2021-03-14T16:33:59.072819500Z\",\"durationInMicroSec\":995},
   {\"eventName\":\"queued\",\"startTimeUTC\":\"2021-03-14T16:33:59.073815300Z\",\"durationInMicroSec\":0},
   {\"eventName\":\"channelAcquisitionStarted\",\"startTimeUTC\":\"2021-03-14T16:33:59.073815300Z\",\"durationInMicroSec\":997},
   {\"eventName\":\"pipelined\",\"startTimeUTC\":\"2021-03-14T16:33:59.074812500Z\",\"durationInMicroSec\":1255947},
   {\"eventName\":\"transitTime\",\"startTimeUTC\":\"2021-03-14T16:34:00.330760Z\",\"durationInMicroSec\":3802608},
   {\"eventName\":\"received\",\"startTimeUTC\":null,\"durationInMicroSec\":0},
   {\"eventName\":\"completed\",\"startTimeUTC\":null,\"durationInMicroSec\":0}
   ]})",
   "causeInfo":null,
   "responseHeaders":"{}",
   "requestHeaders":"[Accept=application/json, x-ms-date=Sun, 14 Mar 2021 16:33:59 GMT, x-ms-documentdb-collection-rid=woU8APC+j8k=, x-ms-client-retry-attempt-count=0, Prefer=return=minimal, x-ms-documentdb-partitionkey=[\"/test/doc/foobar\"], x-ms-remaining-time-in-ms-on-client=60000, Content-Type=application/json]"
}

The code look like:

CosmosAsyncContainer fs = ...;
Mono<?> mono = Mono.empty();
for( int i = 0; i < count; i++ ) {
    doc = ...;
    mono = mono.and( fs.createItem( doc ) );
}
mono.block();

The last line of this code throw the exception. What is wrong on this asynchrone code? Are there another concept that work better for multiple documents?

All the documents use the same partition key because the documents will later also query together.

Edit: After enabling the debug log of the driver if found this interesting snipped on the very long output:

[Azure Cosmos,DEBUG ,3/14 17:34:04,#00031] SessionTokenMismatchRetryPolicy not retrying because StatusCode or SubStatusCode not found.
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00031] received response to cancelled request: {"request":{},"response":{"type":{},"value":{}}}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00031] Operation will NOT be retried. Current attempt {}, Exception: 
[Azure Cosmos,WARN  ,3/14 17:34:04,#00031] Operation will NOT be retried. Write operations which failed due to transient transport errors can not be retried safely when sending the request to the service because they arent idempotent. Current attempt {}, Exception: 
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00024] {}
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00031] Operation will NOT be retried. Exception:
[Azure Cosmos,DEBUG ,3/14 17:34:04,#00031] {"ClassName":"GoneException","userAgent":"azsdk-java-cosmos/4.12.0 Windows10/10.0 JRE/11.0.7","statusCode":410,"resourceAddress":"rntbd://10.10.10.16:10253/apps/DocDbApp/services/DocDbServer15/partitions/a4cb495b-38c8-11e6-8106-8cdcd42c33be/replicas/1p/","i
nnerErrorMessage":"AsyncRntbdRequestRecord({\"args\":{\"transportRequestId\":102,\"activityId\":\"13ddd623-84e3-11eb-a42d-d5052d2cf638\",\"origin\":\"rntbd://10.10.10.16:10253\",\"replicaPath\":\"/apps/DocDbApp/services/DocDbServer15/partitions/a4cb495b-38c8-11e6-8106-8cdcd42c33be/replicas/1p\",\"ti
meCreated\":\"2021-03-14T16:33:59.072819500Z\",\"lifetime\":\"PT5.0601317S\"},\"requestLength\":333849,\"responseLength\":-1,\"status\":{\"done\":false,\"cancelled\":false,\"completedExceptionally\":false},\"timeline\":[{\"eventName\":\"created\",\"startTimeUTC\":\"2021-03-14T16:33:59.072819500Z\",\
"durationInMicroSec\":995},{\"eventName\":\"queued\",\"startTimeUTC\":\"2021-03-14T16:33:59.073815300Z\",\"durationInMicroSec\":0},{\"eventName\":\"channelAcquisitionStarted\",\"startTimeUTC\":\"2021-03-14T16:33:59.073815300Z\",\"durationInMicroSec\":997},{\"eventName\":\"pipelined\",\"startTimeUTC\
":\"2021-03-14T16:33:59.074812500Z\",\"durationInMicroSec\":1255947},{\"eventName\":\"transitTime\",\"startTimeUTC\":\"2021-03-14T16:34:00.330760Z\",\"durationInMicroSec\":3802608},{\"eventName\":\"received\",\"startTimeUTC\":null,\"durationInMicroSec\":0},{\"eventName\":\"completed\",\"startTimeUTC
\":null,\"durationInMicroSec\":0}]})","causeInfo":null,"responseHeaders":"{}","requestHeaders":"[Accept=application/json, x-ms-date=Sun, 14 Mar 2021 16:33:59 GMT, x-ms-documentdb-collection-rid=woU8APC+j8k=, x-ms-client-retry-attempt-count=0, Prefer=return=minimal, x-ms-documentdb-partitionkey=[\"/t
est/doc/large.pdf\"], x-ms-remaining-time-in-ms-on-client=60000, Content-Type=application/json]"}
[Azure Cosmos,TRACE ,3/14 17:34:04,#00031]  at com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestRecord.expire(RntbdRequestRecord.java:229)
[Azure Cosmos,TRACE ,3/14 17:34:04,#00031]  at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66)
[Azure Cosmos,TRACE ,3/14 17:34:04,#00031]  at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
[Azure Cosmos,TRACE ,3/14 17:34:04,#00031]  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
[Azure Cosmos,TRACE ,3/14 17:34:04,#00031]  at java.base/java.lang.Thread.run(Thread.java:834)

Edit2: The underlying cause seems a timeout. The default timeout of Cosmsos DB is 5 seconds. The error occur after approximate 5-6 seconds. The timeout seems to be calculated from the item creation time. Because the creation of the items is many faster as the transfer of the items at some point it will exceeded the timeout. It seems the progress is ignored.

Upvotes: 0

Views: 3586

Answers (1)

Aditya Chavan
Aditya Chavan

Reputation: 1

add this to azure:cosmos in your .yml file connection-mode: gateway

enjoy

Upvotes: 0

Related Questions