Pablo
Pablo

Reputation: 11041

Azure Blobstore: How can I read a file without having to download the whole thing first?

I'm trying to figure out how to read a file from Azure blob storage.

Studying its documentation, I can see that the download_blob method seems to be the main way to access a blob.

This method, though, seems to require downloading the whole blob into a file or some other stream.

Is it possible to read a file from Azure Blob Storage line by line as a stream from the service? (And without having to have downloaded the whole thing first)

Upvotes: 2

Views: 2441

Answers (2)

Ivan Glasenberg
Ivan Glasenberg

Reputation: 30035

Update 0710:

In the latest SDK azure-storage-blob 12.3.2, we can also do the same thing by using download_blob.

The screenshot of the source code of download_blob:

enter image description here

So just provide an offset and length parameter, like below(it works as per my test):

blob_client.download_blob(60,100)

Original answer:

You can not read the blob file line by line, but you can read them as per bytes. Like first read 10 bytes of the data, next you can continue to read the next 10 to 20 bytes etc.

This is only available in the older version of python blob storage sdk 2.1.0. Install it like below:

pip install azure-storage-blob==2.1.0

Here is the sample code(here I read the text, but you can change it to use get_blob_to_stream(container_name,blob_name,start_range=0,end_range=10) method to read stream):

from azure.storage.blob import BlockBlobService, PublicAccess

accountname="xxxx"
accountkey="xxxx"
blob_service_client = BlockBlobService(account_name=accountname,account_key=accountkey)

container_name="test2"
blob_name="a5.txt"

#get the length of the blob file, you can use it if you need a loop in your code to read a blob file.
blob_property = blob_service_client.get_blob_properties(container_name,blob_name)

print("the length of the blob is: " + str(blob_property.properties.content_length) + " bytes")
print("**********")

#get the first 10 bytes data
b1 = blob_service_client.get_blob_to_text(container_name,blob_name,start_range=0,end_range=10)

#you can use the method below to read stream
#blob_service_client.get_blob_to_stream(container_name,blob_name,start_range=0,end_range=10)

print(b1.content)
print("*******")

#get the next range of data
b2=blob_service_client.get_blob_to_text(container_name,blob_name,start_range=10,end_range=50)

print(b2.content)
print("********")

#get the next range of data
b3=blob_service_client.get_blob_to_text(container_name,blob_name,start_range=50,end_range=200)

print(b3.content)

Upvotes: 2

ap1997
ap1997

Reputation: 183

The accepted answer here may be of use to you. The documentation can be found here.

Upvotes: 0

Related Questions