Valeri Ilyin
Valeri Ilyin

Reputation: 49

Azure Data Lake Store - existing connection was forcibly closed by the remote host

I use DataLakeStoreFileSystemManagementClient class for reading files from Data Lake Store. We open a steam for the file with the code like that, read it byte by byte and process it. it is a specific case where we can not use U-SQL for data processing.

m_adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(…);
return m_adlsFileSystemClient.FileSystem.OpenAsync(m_connection.AccountName, path);

The process may take up to 60 minutes for reading and processing the file. The problem is: I am frequently getting “An existing connection was forcibly closed by the remote host.” exception during the stream reading process. Especially when the reading takes 20 minutes and more. It should not be a timeout, because I create DataLakeStoreFileSystemManagementClient with a correct client timeout setting. You can find exception details below. The exception looks random and it’s difficult to predict when you get it. It can be 15th minute as well as 50th minute of processing time.

Is it a normal situation for reading files from Data Lake Store? Is there any restrictions (or recommendations) for the total time of keeping open stream for a file it Data Lake Store?

Exception:

   System.AggregateException: One or more errors occurred. -
--> System.IO.IOException: Unable to read data from the transport connection: An
existing connection was forcibly closed by the remote host. ---> System.Net.Soc
kets.SocketException: An existing connection was forcibly closed by the remote h
ost
   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size,
SocketFlags socketFlags)
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 s
ize)
   --- End of inner exception stack trace ---
   at System.Net.ConnectStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at System.Net.Http.HttpClientHandler.WebExceptionWrapperStream.Read(Byte[] bu
ffer, Int32 offset, Int32 count)
   at System.Net.Http.DelegatingStream.Read(Byte[] buffer, Int32 offset, Int32 c
ount)
   at DataLake.Timeout.Research.FileDownloader.CopyStream(Stream input, Stream o
utput) in C:\TFS-SED\Main\Platform\DataNode\DataLake\DataLake.Timeout.Research\F
ileDownloader.cs:line 107
   at DataLake.Timeout.Research.FileDownloader.<DownloadFileAsync>d__6.MoveNext(
) in C:\TFS-SED\Main\Platform\DataNode\DataLake\DataLake.Timeout.Research\FileDo
wnloader.cs:line 96

Upvotes: 1

Views: 1402

Answers (2)

Valeri Ilyin
Valeri Ilyin

Reputation: 49

Thanks Amit, Your advice finally helped me. That's my version. The sample is reading a batch of bytes with retry logic. I'm using it through BufferedSteam with 4MB buffer. So client can read the stream object by object, but we request service in 4MB batches.

while (!m_endOfFile)
{
    try
    {
        var inputStream = m_client.OpenReadFile(
            m_filePath,
            length: count, 
            offset: m_position);

            var memoryStream = new MemoryStream(count);
            inputStream.CopyTo(memoryStream);
            m_position += memoryStream.Length;
            result = memoryStream.ToArray();
            break;
    }
    catch (CloudException ex)
    {
        if (ex.Response.Content.ToString().Contains("Invalid offset value"))
        {
            m_endOfFile = true;
        }
        else
        {
            throw;
        }
        }
    catch (IOException)
    {
        repeats++;
        if (repeats >= RepeatCount)
        {
            throw;
        }
    }  
}

Upvotes: 0

Amit Kulkarni
Amit Kulkarni

Reputation: 965

To avoid these types of issues, the following is recommended:

  • Read in smaller, retriable chunks. In my experience I have found that 4MB chunks work best and offer the best performance. Additionally, by reading in smaller increments, you can incorporate retry logic to retry from the same offset in the event of a failure.

If you do not know how large your stream is (for example, it is being appended to by another worker while you are reading), you can check for a 400 error with a RemoteException in the payload of “BadOffsetException”. This will indicate that you have started at an offset that is beyond the end of the file.

const int MAX_BYTES_TO_READ = 4 * 1024 * 1024; //4MB
…
long offset = 0;
while(notDone)
{
try
{
                var myStream = client.Read(accountName, offset, MAX_BYTES_TO_READ)
                // do stuff with stream
}
catch(WebException ex)
{
                // read the web exception response
                If (response.contains(“BadOffsetException”))
                                noteDone = false;
}
}

Upvotes: 3

Related Questions