Reputation: 341
I want to utilize an AZURE Function app to read an XLSX File from an AZURE BLOB Storage. The Function shall be called by a REST API call. I can access the blob and download the file but I'm struggling with reading the content of the file with pandas directly. I'm searching and trying for hours but can't find a solution. My latest approach looks like this:
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINERNAME)
blob_client = blob_service_client.get_blob_client(container = CONTAINERNAME, blob=BLOBNAME)
blob = BlobClient(ACCOUNT_URL, CONTAINERNAME, BLOBNAME)
#READ PRODUCTS FILE
blob_client.download_blob().readinto(LOCALFILENAME)
df = pd.read_excel(blob_client.download_blob())
On the MS homepage*, there is one example to download a file from a blob and process it afterwards but as I'm utilizing a function app it doesn't make sense to download the file first if I'm not missing anything...
*https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python
The autocomplete for blob gives me the following options but they are also not really helpful:
The error messages are different depending on the way I try to read the file. The current one is:
System.Private.CoreLib: Exception while executing function: Functions.TrainProductModel. System.Private.CoreLib: Result: Failure
Exception: AttributeError: 'str' object has no attribute 'write'
But I think there is actually something fundamentally wrong with my approach. The desired result is to read the file directly into a pandas table.
Appreciating any support as this blocking progress with my master thesis :/
Upvotes: 2
Views: 1904
Reputation: 14080
Pandas itself does not have the ability to parse xlsx files. Pandas parses xlsx files based on the external library xlrd. And you shouldn't install the high version of xlrd, because the high version cancels the support for xlsx files (only xls files are supported), and the recommended version is 1.2.0 (this is valid for me).
Below is my code:
import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import pandas as pd
CONNECTION_STRING = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
CONTAINERNAME = "test"
BLOBNAME = "test.xlsx"
LOCALFILENAME = "testx.xlsx"
blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
container_client = blob_service_client.get_container_client(CONTAINERNAME)
blob_client = blob_service_client.get_blob_client(container = CONTAINERNAME, blob=BLOBNAME)
#READ PRODUCTS FILE
f = open(LOCALFILENAME, "wb")
f.write(blob_client.download_blob().content_as_bytes())
f.close()
df = pd.read_excel(r''+LOCALFILENAME)
print(df)
And it works on my side:
Upvotes: 2