Wheezil
Wheezil

Reputation: 3462

10000ms timeout in netty while using azure java sdk connecting to ADL2

We are using the Azure java SDK to connect to ADL2 storage, and occasionally get an error like "connection timed out after 10000 ms". But I don't understand where that 10000ms timeout comes from, or how to change it. When we create a data-lake client, we specify all of these timeouts:

DataLakeServiceClientBuilder serviceClientBuilder = new DataLakeServiceClientBuilder()
    .endpoint("https://" + account.account_name + AZURE_STORAGE_HOST_SUFFIX + "/")
    .retryOptions( new RequestRetryOptions(
        RetryPolicyType.EXPONENTIAL
        , MAX_TRIES                // Maximum number of attempts an operation will be retried, default is 4
        , TRY_TIMEOUT_SECONDS    // Maximum time allowed before a request is cancelled and assumed failed, default is Integer.MAX_VALUE
        , RETRY_DELAY_MS        // Amount of delay to use before retrying an operation, default value is 4ms when retryPolicyType is EXPONENTIAL
        , MAX_RETRY_DELAY_MS    // Maximum delay allowed before retrying an operation, default value is 120ms
        , null                    // secondaryHost - Secondary Storage account to retry requests against, default is none
    ));

Where the constants are defined as:

    private static final Integer MAX_TRIES = 13;
    private static final Integer TRY_TIMEOUT_SECONDS = null;    // overall timeout limit imposed by retry schedule
    private static final Long RETRY_DELAY_MS = 60L;
    private static final Long MAX_RETRY_DELAY_MS = 60000L;

I think that the 10000ms timeout is some other setting, or perhaps hard-coded somewhere? Can it be changed?

The stack trace is here:

java/nio/file/Paths.get: reactor.core.Exceptions$ReactiveException: io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: ttsdmsmoke1011storageop.blob.core.windows.net/20.209.154.134:443
reactor.core.Exceptions.propagate(Exceptions.java:410)
reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:101)
reactor.core.publisher.Flux.blockLast(Flux.java:2815)
com.azure.core.util.paging.ContinuablePagedByIteratorBase.requestPage(ContinuablePagedByIteratorBase.java:102)
com.azure.core.util.paging.ContinuablePagedByItemIterable$ContinuablePagedByItemIterator.<init>(ContinuablePagedByItemIterable.java:75)
com.azure.core.util.paging.ContinuablePagedByItemIterable.iterator(ContinuablePagedByItemIterable.java:55)
com.azure.core.util.paging.ContinuablePagedIterable.iterator(ContinuablePagedIterable.java:141)
com.redpointglobal.rg1.nio.adl2.Adl2FileSystem.listFileStores(Adl2FileSystem.java:151)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.initFileStores(CloudFileSystem.java:179)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.getFileStoreOrNull(CloudFileSystem.java:173)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem._getPath(CloudFileSystem.java:117)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.getPath(CloudFileSystem.java:87)
com.redpointglobal.rg1.nio.core.FileSystemBase.getPath(FileSystemBase.java:186)
com.redpointglobal.rg1.nio.core.FileSystemProviderBase.getPath(FileSystemProviderBase.java:111)
net.redpoint.system.FileSystemProviderProxy.getPath(FileSystemProviderProxy.java:72)
java.base/java.nio.file.Path.of(Path.java:208)
java.base/java.nio.file.Paths.get(Paths.java:98)
Caused by:
io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: ttsdmsmoke1011storageop.blob.core.windows.net/20.209.154.134:443
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:416)
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base/java.lang.Thread.run(Thread.java:833)

I've tried to chase the call stack through the source code, but get lost pretty quickly trying to figure out where the timeout is coming from, because I don't understand the SDK's relation to netty.

UPDATE: For more information, the failure occurs when we attempt to list the available file systems:

DataLakeServiceClient client = ...;
Iterator<FileSystemItem> fileSystems = client.listFileSystems().iterator();

The failure is probably unrelated to this specific operation, it is merely the first thing we ever do with a connection, and so that materializes the lazy connection and we get the timeout. My question is still the same: how can we influence the connect-timeout or make it retry the connection? The timeout parameters we specify seem to be ignored in this case.

Also this is a rare failure. We see other cases of disconnect/retry and the timeouts we specify during construction apply there and work. It is just this case -- an apparently failure on first connect of the client -- that is not handled.

Upvotes: 0

Views: 100

Answers (1)

Venkatesan
Venkatesan

Reputation: 10370

io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: ttsdmsmoke1011storageop.blob.core.windows.net/20.209.154.134:443

According to this Document.

By default, the connection timeout is 10 seconds.

  • Connection timeout error occurs if there are excessive number of requests overloading the server.
  • Also check system configuration and make sure there is no firewall that blocks the request from Java.

If you need increasing the connection timeout to a longer duration. This is typically done by modifying the HttpClient settings, like setting the connectTimeout, readTimeout, and writeTimeout parameters.

Code:

HttpClient customHttpClient = new NettyAsyncHttpClientBuilder()
            .connectTimeout(Duration.ofSeconds(30)) // Increase connection timeout
            .readTimeout(Duration.ofSeconds(60))    // Increase read timeout
            .writeTimeout(Duration.ofSeconds(60))   // Increase write timeout
            .build();


        DefaultAzureCredential defaultCredential = new DefaultAzureCredentialBuilder().build();    
        // Create a DataLakeServiceClient with retry options and the custom HTTP client
        DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
            .endpoint("https://xxxx.dfs.core.windows.net") // Replace with your account name
            .httpClient(customHttpClient) // Use the custom HTTP client
            .retryOptions(new RequestRetryOptions(
                RetryPolicyType.EXPONENTIAL,
                MAX_TRIES,
                TRY_TIMEOUT_SECONDS,
                RETRY_DELAY_MS,
                MAX_RETRY_DELAY_MS,
                null // No secondary host for retries
            ))
            .credential(defaultCredential) // Use appropriate credentials
            .buildClient();

        try {
            // List available file systems
            Iterator<FileSystemItem> fileSystems = dataLakeServiceClient.listFileSystems().iterator();

            System.out.println("Available File Systems:");
            while (fileSystems.hasNext()) {
                FileSystemItem fileSystemItem = fileSystems.next();
                System.out.println("- " + fileSystemItem.getName());
            }
        } catch (Exception e) {
            System.err.println("Error occurred while listing file systems: " + e.getMessage());
            e.printStackTrace();
        }

Output:

Available File Systems:
- data
- insights-logs-storagedelete
- insights-logs-storageread
- insights-logs-storagewrite
- insights-metrics-pt1m
- sample
- test

enter image description here

Reference: timeout - java.net.ConnectException: Operation timed out (Connection timed out) | Azure blobs - Stack Overflow by Sridevi.

Upvotes: 1

Related Questions