Reputation: 75
I'm trying to develop a script in python to read a file in .xlsx from a blob storage container called "source", convert it in .csv and store it in a new container (I'm testing the script locally, if working I should include it in an ADF pipeline). So far, I managed to access to the blob storage, but I'm having problems in reading the file content.
from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
import pandas as pd
conn_str = "DefaultEndpointsProtocol=https;AccountName=XXXXXX;AccountKey=XXXXXX;EndpointSuffix=core.windows.net"
container = "source"
blob_name = "prova.xlsx"
container_client = ContainerClient.from_connection_string(
conn_str=conn_str,
container_name=container
)
# Download blob as StorageStreamDownloader object (stored in memory)
downloaded_blob = container_client.download_blob(blob_name)
df = pd.read_excel(downloaded_blob)
print(df)
I get following error:
ValueError: Invalid file path or buffer object type: <class 'azure.storage.blob._download.StorageStreamDownloader'>
I tried with a .csv file as input and writing the parsing code as follows:
df = pd.read_csv(StringIO(downloaded_blob.content_as_text()) )
and it works.
Any suggestion on how to modify the code so that the excel file becomes readable?
Upvotes: 0
Views: 3725
Reputation: 1
Change
df = pd.read_excel(downloaded_blob)
to
df = pd.read_excel(downloaded_blob.content_as_bytes())
Upvotes: 0
Reputation: 23111
I summary the solution as below.
When we use the method pd.read_excel()
in sdk pandas
, we need to provide bytes as input. But when we use download_blob
to download the excel file from azure blob, we just get azure.storage.blob.StorageStreamDownloader
. So we need to use the method readall()
or content_as_bytes()
to convert it to bytes. For more details, please refer to the document and the document
Upvotes: 2