Be Chiller Too
Be Chiller Too

Reputation: 2900

How to stream data to an output blob binding with Azure Function?

I'm trying to retrieve a big file from an API and save it on an Azure Storage account, so I am designing an Azure Function. I don't want my code to download all the data and then write all the data, I can have a data input stream from this API, and I would like to stream data to an output blob.

Here is a small example

import azure.functions as func

def main(req: func.HttpRequest, outputblob: func.Out[func.InputStream]) -> func.HttpResponse:
    name = "stranger"

    # mimick a stream
    for char in name:
        outputblob.set(char)

    return func.HttpResponse(
        "Hello "+name,
        status_code=200
    )

Here is my function.json:

{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "authLevel": "function",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": [
        "get",
        "post"
      ]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    },
    {
      "type": "blob",
      "direction": "out",
      "name": "outputblob",
      "path": "container/hello.txt",
      "connection": "connection_storage"
    }
  ]
}

And when I open the file container/hello.txt from my storage, it contains only the last character, "r", and weighs only 1 byte.

I think that outputblob.set(data) overwrites the data to the output blob.

How can I stream data and append it to my output blob? I'd rather use output blob bindings, but I can use "ContainerClient" objects.

(EDIT: In the docs, they specify that we can use

Streams as func.Out[func.InputStream]

)

Upvotes: 0

Views: 2219

Answers (2)

Be Chiller Too
Be Chiller Too

Reputation: 2900

I used the upload_blob method of azure.storage.blob.BlobClient with blob_type="AppendBlob":

import logging

import requests
from azure.storage.blob import BlobServiceClient


def stream_to_blob(url: str, filename: str) -> int:
    """Download a stream from an URL into a blob." """
    logging.info("Downloading...")
    sess = requests.Session()
    get_fichier = sess.get(url, stream=True)
    get_fichier.raise_for_status()

    # Connect to the storage account :
    connection_str = "DefaultEndpointsProtocol=..."
    blob_service_client = BlobServiceClient.from_connection_string(connection_str)
    container_client = blob_service_client.get_container_client("mycontainer")
    blob_client = container_client.get_blob_client(filename)
    filesize = 0

    # Appending data one block at a time
    # We can upload data up to 4 MB at a time
    for block in get_fichier.iter_content(4 * 1024 * 1024):
        filesize += len(block)
        logging.info(
            "Appending %d bytes to the blob (total = %d)...", len(block), filesize
        )
        blob_client.upload_blob(block, blob_type="AppendBlob")

    logging.info("Downloading finished")
    return filesize

Upvotes: 1

Jacek Kuliś
Jacek Kuliś

Reputation: 91

In the loop, when you are mimicking a stream you are overriding the content of the output blob each iteration, that's why at the end you are receiving the last letter.

Solution: assign the whole array of bytes to the outputblob.

See https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-output?tabs=python for the reference.

Upvotes: 1

Related Questions