Reputation: 2900
I'm trying to retrieve a big file from an API and save it on an Azure Storage account, so I am designing an Azure Function. I don't want my code to download all the data and then write all the data, I can have a data input stream from this API, and I would like to stream data to an output blob.
Here is a small example
import azure.functions as func
def main(req: func.HttpRequest, outputblob: func.Out[func.InputStream]) -> func.HttpResponse:
name = "stranger"
# mimick a stream
for char in name:
outputblob.set(char)
return func.HttpResponse(
"Hello "+name,
status_code=200
)
Here is my function.json:
{
"scriptFile": "__init__.py",
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": [
"get",
"post"
]
},
{
"type": "http",
"direction": "out",
"name": "$return"
},
{
"type": "blob",
"direction": "out",
"name": "outputblob",
"path": "container/hello.txt",
"connection": "connection_storage"
}
]
}
And when I open the file container/hello.txt from my storage, it contains only the last character, "r", and weighs only 1 byte.
I think that outputblob.set(data)
overwrites the data to the output blob.
How can I stream data and append it to my output blob? I'd rather use output blob bindings, but I can use "ContainerClient" objects.
(EDIT: In the docs, they specify that we can use
Streams as func.Out[func.InputStream]
)
Upvotes: 0
Views: 2219
Reputation: 2900
I used the upload_blob
method of azure.storage.blob.BlobClient
with blob_type="AppendBlob"
:
import logging
import requests
from azure.storage.blob import BlobServiceClient
def stream_to_blob(url: str, filename: str) -> int:
"""Download a stream from an URL into a blob." """
logging.info("Downloading...")
sess = requests.Session()
get_fichier = sess.get(url, stream=True)
get_fichier.raise_for_status()
# Connect to the storage account :
connection_str = "DefaultEndpointsProtocol=..."
blob_service_client = BlobServiceClient.from_connection_string(connection_str)
container_client = blob_service_client.get_container_client("mycontainer")
blob_client = container_client.get_blob_client(filename)
filesize = 0
# Appending data one block at a time
# We can upload data up to 4 MB at a time
for block in get_fichier.iter_content(4 * 1024 * 1024):
filesize += len(block)
logging.info(
"Appending %d bytes to the blob (total = %d)...", len(block), filesize
)
blob_client.upload_blob(block, blob_type="AppendBlob")
logging.info("Downloading finished")
return filesize
Upvotes: 1
Reputation: 91
In the loop, when you are mimicking a stream you are overriding the content of the output blob each iteration, that's why at the end you are receiving the last letter.
Solution: assign the whole array of bytes to the outputblob
.
See https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-output?tabs=python for the reference.
Upvotes: 1