amethianil
amethianil

Reputation: 530

Read one line from 200gb text file which is stored on azure blob storage using C#

I have 200 gb text file on azure blob storage . I want to search in the text and then matching line need to download instead of whole 200 gb file and then select that line.

I have written code in c# by downloading complete file and then searching and selecting but its taking too much time and then failed with timeout error .

var content ="" ////Downloading whole text from azure blob storage
 StringReader strReader = new StringReader(contents);
 var searchedLines1 = contents.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries).
            Select((text, index) => new { text, lineNumber = index + 1 })
                       .Where(x => x.text.Contains("[email protected]") || x.lineNumber == 1);

Upvotes: 0

Views: 1156

Answers (1)

TheGeneral
TheGeneral

Reputation: 81573

You will need to stream the file and set the timeout. I have wrapped the stream implementation in IAsyncEnumerable which is completely unnecessary... but why not

Given

public static async IAsyncEnumerable<string> Read(StreamReader stream)
{
   while(!stream.EndOfStream)
      yield return await stream.ReadLineAsync();
}

Usage

var blobClient = new BlobClient( ... , new BlobClientOptions()
{
   Transport = new HttpClientTransport(new HttpClient {Timeout = Timeout.InfiniteTimeSpan}),
   Retry = {NetworkTimeout = Timeout.InfiniteTimeSpan}
});

await using var stream = await blobClient.OpenReadAsync();
using var reader = new StreamReader(stream);

await foreach (var line in Read(reader))
   if (line.Contains("bob"))
   {
      Console.WriteLine("Yehaa");
      // exit or what ever
   }

Disclaimer : Completely untested

Note : If you are using C#4 you will need to remove all all the awaits and async methods, and just use the for loop with stream.ReadLine

Upvotes: 2

Related Questions