Vicky
Vicky

Reputation: 73

How to implement fast search on Azure Blob?

I am done with writing the code to upload files (text files) to azure blob storage. Now I want to provide search based on text files content. For ex. If I search for "Hello" then the name of files that contains "Hello" words should appear in search result. Here my code to search

class BlobSearch
{
    static void Main(string[] args)
    {
        string searchText = "Hello";
        CloudStorageAccount account = CloudStorageAccount.Parse(azureConString);
        CloudBlobClient blobClient = account.CreateCloudBlobClient();
        CloudBlobContainer blobContainer = blobClient.GetContainerReference("MyBlobContainer");

        blobContainer.FetchAttributes();

        var blobItemList = blobContainer.ListBlobs();

        foreach (var item in blobItemList)
        {
            string line = string.Empty;
            CloudBlockBlob blockBlob = blobContainer.GetBlockBlobReference(item.Uri.ToString());

            if(blockBlob.Name.Contains(".txt"))
            {
                int lineno = 1;                    
                using (var stream = blockBlob.OpenRead())
                {
                    using (StreamReader reader = new StreamReader(stream))
                    {
                        while ((line = reader.ReadLine()) != null)
                        {
                            if (line.IndexOf(searchText) != -1)
                            {                                    
                                Console.WriteLine("Line : " + lineno  +" => "+ blockBlob.Name);
                            }
                            lineno++;
                        }
                    }
                }                  
            }                
        }
        Console.WriteLine("SEARCH COMPLETE");
        Console.ReadLine();
    }
}

Above code is working but it is too slow. Is there any way to do it faster or Can improve above code.

Upvotes: 3

Views: 1336

Answers (3)

richard
richard

Reputation: 12498

That is a very bad way to do it. It will be very slow. The best option for this is Azure Search. Search can now automatically index your blobs!

Upvotes: 1

usr
usr

Reputation: 171178

Your code is not bad. Find out where most time is spent. Probably network or CPU. For network, you are out of luck. For CPU you can parallelize.

You are using culture-specific string processing. StringComparison.Ordinal is far less CPU intensive (like 10x). It has different semantics, though.

Upvotes: 0

Kevin Cook
Kevin Cook

Reputation: 1932

// get blob data
CloudBlob cloudBlob = blobContainer.GetBlobReference(blobName);
string text = cloudBlob.DownloadText();

Maybe downloading it in one go is faster than reading line by line in a loop?

Upvotes: 1

Related Questions