Freejack
Freejack

Reputation: 302

Bloated logs output in Python Azure Function Log Stream Monitoring when the function accesses data from Azure Blob

My Http Triggered Azure Function has a workflow that consists of 3 steps:

  1. It receives an API call with some parameters

  2. It reads the data from the Azure Blob with this function:

def read_dataframe_from_blob(account_name, account_key, container_name, blob_name):
    # Create a connection string to the Azure Blob storage account
    connect_str = f"DefaultEndpointsProtocol=https;AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net"

    # Create a BlobServiceClient object using the connection string
    blob_service_client = BlobServiceClient.from_connection_string(connect_str)

    # Get a reference to the Parquet blob
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)

    # Download the blob data as a stream
    blob_data = blob_client.download_blob()

    # Read the Parquet data from the stream into a pandas DataFrame
    df = pd.read_parquet(io.BytesIO(blob_data.readall()))

    return df
  1. It preprocesses the data from 1. and returns some output.

I previously created a very similiar workflow and the Function Log Stream was pretty clean, it included only elements defined in logging. However, when I read the data from blob, the logs in Azure Function Log Stream (and local, of course) start with:

2023-06-05T07:35:42Z   [Information]   Request URL: 'https://myaccount.blob.core.windows.net/mycontainer/my.parquet'
Request method: 'GET'
Request headers:
    'x-ms-range': 'REDACTED'
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.16.0 Python/3.10.11 (Linux-5.10.164.1-1.cm1-x86_64-with-glibc2.31)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '932afd88-0373-11ee-8724-1270efe16c2d'
    'Authorization': 'REDACTED'
No body was attached to the request
2023-06-05T07:35:42Z   [Information]   Response status: 206
Response headers:
    'Content-Length': '33554432'
    'Content-Type': 'application/octet-stream'
    'Content-Range': 'REDACTED'
    'Last-Modified': 'Thu, 01 Jun 2023 08:00:30 GMT'
    'Accept-Ranges': 'REDACTED'
    'ETag': '"0x8DB627644CFEA3E"'
    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
    'x-ms-request-id': '08843836-f01e-0019-6780-974298000000'
    'x-ms-client-request-id': '932afd88-0373-11ee-8724-1270efe16c2d'
    'x-ms-version': 'REDACTED'
    'x-ms-creation-time': 'REDACTED'
    'x-ms-blob-content-md5': 'REDACTED'
    'x-ms-lease-status': 'REDACTED'
    'x-ms-lease-state': 'REDACTED'
    'x-ms-blob-type': 'REDACTED'
    'Content-Disposition': 'REDACTED'
    'x-ms-server-encrypted': 'REDACTED'
    'Date': 'Mon, 05 Jun 2023 07:35:42 GMT'

...repeated multiple times. Then I get the info from my logs.

What is the reason for such behaviour? Is there any smooth way to optimize the code or avoid these bloated logs?

Edit: I've found a similar discussion here but I'm not sure how to replicate it for Python app.

Edit2: It's not a solution, but I've found a github bug report here

Still - would appreciate any workarounds.

Upvotes: 1

Views: 364

Answers (1)

naman srivastava
naman srivastava

Reputation: 11

import logging

# Set the desired log level (e.g., INFO, DEBUG, ERROR)
logging.basicConfig(level=logging.INFO)

def main(req):
    # Your code to access data from Azure Blob

    # Example logging statements
    logging.info("Accessing data from Azure Blob")
    logging.debug("Debug message")
    logging.error("Error message")

    # Rest of your function code

    return "Function executed successfully"

In this code, logging.basicConfig() sets up the basic configuration for logging, including the desired log level. You can adjust the log level to control the verbosity of the logs (e.g., logging.INFO, logging.DEBUG, logging.ERROR).

Upvotes: -1

Related Questions