Reputation: 997
I have got two questions on reading and writing Python objects from/to Azure blob storage.
Can someone tell me how to write Python dataframe as csv file directly into Azure Blob without storing it locally?
I tried using the functions create_blob_from_text
and create_blob_from_stream
but none of them works.
Converting dataframe to string and using create_blob_from_text
function
writes the file into the blob but as a plain string but not as csv.
df_b = df.to_string()
block_blob_service.create_blob_from_text('test', 'OutFilePy.csv', df_b)
How to directly read a json file in Azure blob storage directly into Python?
Upvotes: 18
Views: 52641
Reputation: 80
So you need a BytesIO file to upload to the blob, using the upload_blob method from azure.storage.blob module. You will also need to create a cotainer_client from the same module
blob_report_name = 'OutFilePy.csv'
stream_file = BytesIO()
df_b.to_csv(stream_file)
file_to_blob = stream_file.getvalue()
blob_client = container_client.get_blob_client(blob_report_name)
blob_client.upload_blob(data=file_to_blob, overwrite=True)
Upvotes: 0
Reputation: 21
Here's an example of writing a Python DataFrame into Azure Blob Storage without storing it locally. It doesn't require String.IO and uses the ContainerClient instead of BlockBlobService.
import pandas as pd
def write_csv(env, df_path, df):
container_client = ContainerClient(
env['container_url'],
container_name=env['container_name'],
credential=env['container_cred']
)
output = df.to_csv (index_label="idx", encoding = "utf-8")
print(output)
blob_client = container_client.get_blob_client(df_path)
blob_client.upload_blob(output, overwrite=True)
return 'success'
Upvotes: 2
Reputation: 71
There was update in BlobServiceClient. create_blob_from_text method is no longer supported. Now you can use get_blob_client to get or create the blob file. Blob need not exist:
output = dataframe.to_csv(index_label="idx", encoding="utf-8")
blob_service = BlobServiceClient.from_connection_string(
f"DefaultEndpointsProtocol=https;AccountName={ACCOUNT_NAME};AccountKey=
{ACCOUNT_KEY};EndpointSuffix=core.windows.net"
)
container_client = blob_service.get_container_client(DEST_CONTAINER)
blob_client = blob_service.get_blob_client(container=DEST_CONTAINER,
blob="kcScenarioTest/"+str(current_time.microsecond)+".csv")
blob_client.upload_blob(output,overwrite=True,content_settings=ContentSettings(content_type="text/csv"))
Upvotes: 3
Reputation: 109
The approved answer did not work for me, as it depends on the azure-storage (deprecated/legacy as of 2021) package. I changed it as follows:
from azure.storage.blob import *
import dotenv
import io
import pandas as pd
dotenv.load_dotenv()
blob_block = ContainerClient.from_connection_string(
conn_str=os.environ["CONNECTION_STRING"],
container_name=os.environ["CONTAINER_NAME"]
)
output = io.StringIO()
partial = df.DataFrame()
output = partial.to_csv(encoding='utf-8')
blob_block.upload_blob(name, output, overwrite=True, encoding='utf-8')
Upvotes: 10
Reputation: 23792
- Can someone tell me how to write Python dataframe as csv file directly into Azure Blob without storing it locally?
You could use pandas.DataFrame.to_csv method.
Sample code:
from azure.storage.blob import (
BlockBlobService
)
import pandas as pd
import io
output = io.StringIO()
head = ["col1" , "col2" , "col3"]
l = [[1 , 2 , 3],[4,5,6] , [8 , 7 , 9]]
df = pd.DataFrame (l , columns = head)
print(df)
output = df.to_csv (index_label="idx", encoding = "utf-8")
print(output)
accountName = "***"
accountKey = "***"
containerName = "test1"
blobName = "test3.json"
blobService = BlockBlobService(account_name=accountName, account_key=accountKey)
blobService.create_blob_from_text('test1', 'OutFilePy.csv', output)
Output result:
2.How to directly read a json file in Azure blob storage directly into Python?
Sample code:
from azure.storage.blob import (
BlockBlobService
)
accountName = "***"
accountKey = "***"
containerName = "test1"
blobName = "test3.json"
blobService = BlockBlobService(account_name=accountName, account_key=accountKey)
result = blobService.get_blob_to_text(containerName,blobName)
print(result.content)
Output result:
Hope it helps you.
Upvotes: 22