Reputation: 3462
We are using the Azure java SDK to connect to ADL2 storage, and occasionally get an error like "connection timed out after 10000 ms". But I don't understand where that 10000ms timeout comes from, or how to change it. When we create a data-lake client, we specify all of these timeouts:
DataLakeServiceClientBuilder serviceClientBuilder = new DataLakeServiceClientBuilder()
.endpoint("https://" + account.account_name + AZURE_STORAGE_HOST_SUFFIX + "/")
.retryOptions( new RequestRetryOptions(
RetryPolicyType.EXPONENTIAL
, MAX_TRIES // Maximum number of attempts an operation will be retried, default is 4
, TRY_TIMEOUT_SECONDS // Maximum time allowed before a request is cancelled and assumed failed, default is Integer.MAX_VALUE
, RETRY_DELAY_MS // Amount of delay to use before retrying an operation, default value is 4ms when retryPolicyType is EXPONENTIAL
, MAX_RETRY_DELAY_MS // Maximum delay allowed before retrying an operation, default value is 120ms
, null // secondaryHost - Secondary Storage account to retry requests against, default is none
));
Where the constants are defined as:
private static final Integer MAX_TRIES = 13;
private static final Integer TRY_TIMEOUT_SECONDS = null; // overall timeout limit imposed by retry schedule
private static final Long RETRY_DELAY_MS = 60L;
private static final Long MAX_RETRY_DELAY_MS = 60000L;
I think that the 10000ms timeout is some other setting, or perhaps hard-coded somewhere? Can it be changed?
The stack trace is here:
java/nio/file/Paths.get: reactor.core.Exceptions$ReactiveException: io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: ttsdmsmoke1011storageop.blob.core.windows.net/20.209.154.134:443
reactor.core.Exceptions.propagate(Exceptions.java:410)
reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:101)
reactor.core.publisher.Flux.blockLast(Flux.java:2815)
com.azure.core.util.paging.ContinuablePagedByIteratorBase.requestPage(ContinuablePagedByIteratorBase.java:102)
com.azure.core.util.paging.ContinuablePagedByItemIterable$ContinuablePagedByItemIterator.<init>(ContinuablePagedByItemIterable.java:75)
com.azure.core.util.paging.ContinuablePagedByItemIterable.iterator(ContinuablePagedByItemIterable.java:55)
com.azure.core.util.paging.ContinuablePagedIterable.iterator(ContinuablePagedIterable.java:141)
com.redpointglobal.rg1.nio.adl2.Adl2FileSystem.listFileStores(Adl2FileSystem.java:151)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.initFileStores(CloudFileSystem.java:179)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.getFileStoreOrNull(CloudFileSystem.java:173)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem._getPath(CloudFileSystem.java:117)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.getPath(CloudFileSystem.java:87)
com.redpointglobal.rg1.nio.core.FileSystemBase.getPath(FileSystemBase.java:186)
com.redpointglobal.rg1.nio.core.FileSystemProviderBase.getPath(FileSystemProviderBase.java:111)
net.redpoint.system.FileSystemProviderProxy.getPath(FileSystemProviderProxy.java:72)
java.base/java.nio.file.Path.of(Path.java:208)
java.base/java.nio.file.Paths.get(Paths.java:98)
Caused by:
io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: ttsdmsmoke1011storageop.blob.core.windows.net/20.209.154.134:443
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:416)
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base/java.lang.Thread.run(Thread.java:833)
I've tried to chase the call stack through the source code, but get lost pretty quickly trying to figure out where the timeout is coming from, because I don't understand the SDK's relation to netty.
UPDATE: For more information, the failure occurs when we attempt to list the available file systems:
DataLakeServiceClient client = ...;
Iterator<FileSystemItem> fileSystems = client.listFileSystems().iterator();
The failure is probably unrelated to this specific operation, it is merely the first thing we ever do with a connection, and so that materializes the lazy connection and we get the timeout. My question is still the same: how can we influence the connect-timeout or make it retry the connection? The timeout parameters we specify seem to be ignored in this case.
Also this is a rare failure. We see other cases of disconnect/retry and the timeouts we specify during construction apply there and work. It is just this case -- an apparently failure on first connect of the client -- that is not handled.
Upvotes: 0
Views: 100
Reputation: 10370
io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: ttsdmsmoke1011storageop.blob.core.windows.net/20.209.154.134:443
According to this Document.
By default, the connection timeout is 10 seconds.
firewall
that blocks the request from Java.If you need increasing the connection timeout
to a longer duration. This is typically done by modifying the HttpClient
settings, like setting the connectTimeout
, readTimeout
, and writeTimeout
parameters.
Code:
HttpClient customHttpClient = new NettyAsyncHttpClientBuilder()
.connectTimeout(Duration.ofSeconds(30)) // Increase connection timeout
.readTimeout(Duration.ofSeconds(60)) // Increase read timeout
.writeTimeout(Duration.ofSeconds(60)) // Increase write timeout
.build();
DefaultAzureCredential defaultCredential = new DefaultAzureCredentialBuilder().build();
// Create a DataLakeServiceClient with retry options and the custom HTTP client
DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
.endpoint("https://xxxx.dfs.core.windows.net") // Replace with your account name
.httpClient(customHttpClient) // Use the custom HTTP client
.retryOptions(new RequestRetryOptions(
RetryPolicyType.EXPONENTIAL,
MAX_TRIES,
TRY_TIMEOUT_SECONDS,
RETRY_DELAY_MS,
MAX_RETRY_DELAY_MS,
null // No secondary host for retries
))
.credential(defaultCredential) // Use appropriate credentials
.buildClient();
try {
// List available file systems
Iterator<FileSystemItem> fileSystems = dataLakeServiceClient.listFileSystems().iterator();
System.out.println("Available File Systems:");
while (fileSystems.hasNext()) {
FileSystemItem fileSystemItem = fileSystems.next();
System.out.println("- " + fileSystemItem.getName());
}
} catch (Exception e) {
System.err.println("Error occurred while listing file systems: " + e.getMessage());
e.printStackTrace();
}
Output:
Available File Systems:
- data
- insights-logs-storagedelete
- insights-logs-storageread
- insights-logs-storagewrite
- insights-metrics-pt1m
- sample
- test
Reference: timeout - java.net.ConnectException: Operation timed out (Connection timed out) | Azure blobs - Stack Overflow by Sridevi.
Upvotes: 1