Reputation: 530
I have 200 gb text file on azure blob storage . I want to search in the text and then matching line need to download instead of whole 200 gb file and then select that line.
I have written code in c# by downloading complete file and then searching and selecting but its taking too much time and then failed with timeout error .
var content ="" ////Downloading whole text from azure blob storage
StringReader strReader = new StringReader(contents);
var searchedLines1 = contents.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries).
Select((text, index) => new { text, lineNumber = index + 1 })
.Where(x => x.text.Contains("[email protected]") || x.lineNumber == 1);
Upvotes: 0
Views: 1156
Reputation: 81573
You will need to stream the file and set the timeout. I have wrapped the stream implementation in IAsyncEnumerable
which is completely unnecessary... but why not
Given
public static async IAsyncEnumerable<string> Read(StreamReader stream)
{
while(!stream.EndOfStream)
yield return await stream.ReadLineAsync();
}
Usage
var blobClient = new BlobClient( ... , new BlobClientOptions()
{
Transport = new HttpClientTransport(new HttpClient {Timeout = Timeout.InfiniteTimeSpan}),
Retry = {NetworkTimeout = Timeout.InfiniteTimeSpan}
});
await using var stream = await blobClient.OpenReadAsync();
using var reader = new StreamReader(stream);
await foreach (var line in Read(reader))
if (line.Contains("bob"))
{
Console.WriteLine("Yehaa");
// exit or what ever
}
Disclaimer : Completely untested
Note : If you are using C#4 you will need to remove all all the awaits and async methods, and just use the for loop with stream.ReadLine
Upvotes: 2