Reputation: 23018
In Microsoft Azure we have an Event Hub capturing JSON data and storing it in AVRO format in a blob storage account:
I have written a python script, which would fetch the AVRO files from the Event Hub:
import os, avro
from io import BytesIO
from operator import itemgetter, attrgetter
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
conn_str = 'DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net'
container_name = 'container1'
blob_service_client = BlobServiceClient.from_connection_string(conn_str)
container_client = blob_service_client.get_container_client(container_name)
blob_list = []
for blob in container_client.list_blobs():
if blob.name.endswith('.avro'):
blob_list.append(blob)
blob_list.sort(key=attrgetter('creation_time'), reverse=True)
This works well and I get a list of AVRO blobs, sorted by the creation time.
Now I am trying to add the final steps where I would download the blobs, parse the AVRO-formatted data and retrieve the JSON payload.
I try to retrieve each blob in the list into memory buffer and to parse it:
for blob in blob_list:
blob_client = container_client.get_blob_client(blob.name)
downloader = blob_client.download_blob()
stream = BytesIO()
downloader.download_to_stream(stream) # also tried readinto(stream)
reader = DataFileReader(stream, DatumReader())
for event_data in reader:
print(event_data)
reader.close()
Unfortunately, the above Python code does not work, nothing is printed.
I have also seen, that there is a StorageStreamDownloader.readall()
method, but I am not sure, how to apply it.
I am using Windows 10, python 3.8.5 and avro 1.10.0 installed by pip.
Upvotes: 3
Views: 2057
Reputation: 30015
When using readall()
method, it should be used as below:
with open("xxx", "wb+") as my_file:
my_file.write(blob_client.download_blob().readall()) # Write blob contents into the file.
For more details about reading captured eventhub data, you can refer to this official doc: Create a Python script to read your Capture files.
Please let me know if you still have more issues:).
Upvotes: 1