Reputation: 997
Can someone tell me if it is possible to read a csv file directly from Azure blob storage as a stream and process it using Python? I know it can be done using C#.Net (shown below) but wanted to know the equivalent library in Python to do this.
CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("outfiles");
CloudBlob blob = container.GetBlobReference("Test.csv");*
Upvotes: 21
Views: 102518
Reputation: 78
Azure already have an API to process the blob in memory as a bytes object.
container: ContainerClient = ContainerClient.from_connection_string(os.getenv("BLOB_CONNECTION_STRING"), bucket)
stream: StorageStreamDownloader = container.download_blob(blob=key)
bytes_content = stream.readall()
string_content = bytes_content.decode()
file = StringIO(string_content)
csv_data = csv.reader(file, delimiter=",")
Upvotes: 0
Reputation: 87
To Read from Azure Blob I want to use csv from azure blob storage to openpyxl xlsx
from io import BytesIO
conn_str = os.environ.get('BLOB_CONN_STR')
container_name = os.environ.get('CONTAINER_NAME')
blob = BlobClient.from_connection_string(conn_str, container_name=container_name,
blob_name="YOUR BLOB PATH HERE FROM AZURE BLOB")
data = blob.download_blob()
workbook_obj = openpyxl.load_workbook(filename=BytesIO(data.readall()))
To write in Azure Blob
I struggled lot for this I don't want anyone to do same, If you are using openpyxl and want to directly write from azure function to blob storage do following steps and you will achieve what you are seeking for.
Thanks. HMU if you need anyhelp.
blob=BlobClient.from_connection_string(conn_str=conString,container_name=container_name, blob_name=r'YOUR_PATH/test1.xlsx')
blob.upload_blob(save_virtual_workbook(wb))
Upvotes: 1
Reputation: 998
I recommend using smart_open.
import os
from azure.storage.blob import BlobServiceClient
from smart_open import open
connect_str = os.environ['AZURE_STORAGE_CONNECTION_STRING']
transport_params = {
'client': BlobServiceClient.from_connection_string(connect_str),
}
# stream from Azure Blob Storage
with open('azure://my_container/my_file.txt', transport_params=transport_params) as fin:
for line in fin:
print(line)
# stream content *into* Azure Blob Storage (write mode):
with open('azure://my_container/my_file.txt', 'wb', transport_params=transport_params) as fout:
fout.write(b'hello world')
Upvotes: 5
Reputation: 21
Since I wasn't able to find what I needed on this thread, I wanted to follow up on @SebastianDziadzio's answer to retrieve the data without downloading it as a local file, which is what I was trying to find for myself.
Replace the with
statement with the following:
from io import BytesIO
import pandas as pd
with BytesIO() as input_blob:
blob_client_instance.download_blob().download_to_stream(input_blob)
input_blob.seek(0)
df = pd.read_csv(input_blob, compression='infer', index_col=0)
Upvotes: 2
Reputation: 2176
Here is the simple way to read a CSV using Pandas from a Blob:
import os
from azure.storage.blob import BlobServiceClient
service_client = BlobServiceClient.from_connection_string(os.environ['AZURE_STORAGE_CONNECTION_STRING'])
client = service_client.get_container_client("your_container")
bc = client.get_blob_client(blob="your_folder/yourfile.csv")
data = bc.download_blob()
with open("file.csv", "wb") as f:
data.readinto(f)
df = pd.read_csv("file.csv")
Upvotes: 2
Reputation: 137
I know this is an old post but if someone wants to do the same. I was able to access as per below codes
Note: you need to set the AZURE_STORAGE_CONNECTION_STRING which can be obtained from Azure Portal -> Go to your storage -> Settings -> Access keys and then you will get the connection string there.
For Windows: setx AZURE_STORAGE_CONNECTION_STRING ""
For Linux: export AZURE_STORAGE_CONNECTION_STRING=""
For macOS: export AZURE_STORAGE_CONNECTION_STRING=""
import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
print(connect_str)
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client("Your Storage Name Here")
try:
print("\nListing blobs...")
# List the blobs in the container
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t" + blob.name)
except Exception as ex:
print('Exception:')
print(ex)
Upvotes: 0
Reputation: 530
Here's a way to do it with the new version of the SDK (12.0.0):
from azure.storage.blob import BlobClient
blob = BlobClient(account_url="https://<account_name>.blob.core.windows.net"
container_name="<container_name>",
blob_name="<blob_name>",
credential="<account_key>")
with open("example.csv", "wb") as f:
data = blob.download_blob()
data.readinto(f)
See here for details.
Upvotes: 13
Reputation: 131
Provide Your Azure subscription Azure storage name and Secret Key as Account Key here
block_blob_service = BlockBlobService(account_name='$$$$$$', account_key='$$$$$$')
This still get the blob and save in current location as 'output.jpg'
block_blob_service.get_blob_to_path('you-container_name', 'your-blob', 'output.jpg')
This will get text/item from blob
blob_item= block_blob_service.get_blob_to_bytes('your-container-name','blob-name')
blob_item.content
Upvotes: 4
Reputation: 160
One can stream from blob with python like this:
from tempfile import NamedTemporaryFile
from azure.storage.blob.blockblobservice import BlockBlobService
entry_path = conf['entry_path']
container_name = conf['container_name']
blob_service = BlockBlobService(
account_name=conf['account_name'],
account_key=conf['account_key'])
def get_file(filename):
local_file = NamedTemporaryFile()
blob_service.get_blob_to_stream(container_name, filename, stream=local_file,
max_connections=2)
local_file.seek(0)
return local_file
Upvotes: 5
Reputation: 136126
Yes, it is certainly possible to do so. Check out Azure Storage SDK for Python
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='myaccount', account_key='mykey')
block_blob_service.get_blob_to_path('mycontainer', 'myblockblob', 'out-sunset.png')
You can read the complete SDK documentation here: http://azure-storage.readthedocs.io.
Upvotes: 17