Reputation: 1592

Downloading of blob in pageblob vs blockblob

Please correct me if I am wrong. Is downloading of PageBlob and BlockBlob the same and write operation is different? I assumed downloading are the same because if we have a blob URI we can just use a GET to download the blob. Lastly, is there a documentation that supports my hypothesis of downloading the blob is the same for both of them regardless of type of blob.

Upvotes: 1

Answers (2)

Gaurav Mantri

Reputation: 136336

The answer is both yes and no.

Downloading for all kinds of blob is the same i.e. for downloading any blob eventually you will be performing Get Blob REST API operation. When you perform this operation, you can download any kind of blob.

However with page blobs you can optimize the download simply by downloading only the occupied page ranges. This is called Sparse Download. Essentially what you will do first is find the occupied page ranges in your page blob and then only download the occupied page ranges. This way you will be able to download page blobs much faster. For example, if you have a 128GB page blob but it only contains 32GB data (rest is empty) then by using sparse download approach you will only download 32GB data. This is not possible with other blob types.

I have not checked in the recent versions of Storage SDKs but I am pretty sure that the SDKs have implemented sparse download when it comes to page blob downloads.

Upvotes: 2

SaiKarri-MT

Reputation: 1301

Usually Pageblobs are of big size and they act as volumes for virtual machines.

“Page blobs are made up of 512-byte pages up to 8 TB in total size and are designed for frequent random read/write operations.”

Microsoft documentation to understand more about Page Blob.

There will be three different types of blobs in Azure storage, Page, Append, Blob.

Below Python code will help us in downloading the blobs to our local storage.

import logging
import os
import azure.functions as func
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient

logging.info('Python HTTP trigger function processed a request.')
MY_CONNECTION_STRING = "STORAGE_ACCOUNT_STRING"
CONTAINER = "CONTAINERNAME"
LOCAL_PATH = "REPLACE_THIS"
 
class AzureBlobFileDownloader:
  def __init__(self):
    self.blob_service_client =  BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
    self.my_container = self.blob_service_client.get_container_client(CONTAINER)

  def save_blob(self,file_name,file_content):
    # Get full path to the file
    download_file_path = os.path.join(LOCAL_PATH, file_name)
    # for nested blobs, create local path as well!
    os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
    with open(download_file_path, "wb") as file:
      file.write(file_content)

  def download_blobs_in_container(self):
    my_blobs = self.my_container.list_blobs()
    for blob in my_blobs:
      print(blob.name)
      bytes = self.my_container.get_blob_client(blob).download_blob().readall()
      self.save_blob(blob.name, bytes)

# Call the class.
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()

Below code from Azure Python SDK will help you to understand advanced ways of handling Page blobs

#Page Blob Operations
def page_blob_operations(self, account):
    file_to_upload = "HelloWorld.png"
    page_size = 1024;
    
    # Create an page blob service object
    pageblob_service = account.create_page_blob_service()
    container_name = 'pageblobcontainer' + self.random_data.get_random_name(6)

    try:
        # Create a new container
        print('1. Create a container with name - ' + container_name)
        pageblob_service.create_container(container_name)
        
        # Create a new page blob to upload the file
        print('2. Create a page blob')
        pageblob_service.create_blob(container_name, file_to_upload, page_size * 1024)
        
        # Read the file
        print('3. Upload pages to page blob')
        index = 0
        with open(file_to_upload, "rb") as file:
            file_bytes = file.read(page_size)
            while len(file_bytes) > 0:
                if len(file_bytes) < page_size:
                    file_bytes = bytes(file_bytes + bytearray(page_size - len(file_bytes)))
                    
                pageblob_service.update_page(container_name, file_to_upload, file_bytes, index * page_size, index * page_size + page_size - 1)
                
                file_bytes = file.read(page_size)
                
                index = index + 1
        
        pages = pageblob_service.get_page_ranges(container_name, file_to_upload)
        
        print('4. Enumerate pages in page blob')
        for page in pages:
            print('Page ' + str(page.start) + ' - ' + str(page.end))
    finally:
        print('5. Delete container')
        if pageblob_service.exists(container_name):
            pageblob_service.delete_container(container_name)

Have a look on advanced samples which shows how to handle Page, Block and Append blobs from Azure SDK for Python.

Refer to Azure Docs for handling the blobs in NodeJS.

Upvotes: 1

Downloading of blob in pageblob vs blockblob

Answers (2)

Related Questions