Reputation: 22958
I have downloaded an AVRO file (with JSON payload) from Microsoft Azure to my Windows 10 computer:
Then with python 3.8.5 and avro 1.10.0 installed via pip I have tried running the following script:
import os, avro
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
reader = DataFileReader(open("48.avro", "rb"), DatumReader())
for d in reader:
print(d)
reader.close()
Unfortunately, nothing is printed by the script.
Then I have searched around and have tried to add a schema as in below:
schema_str = """
{
"type" : "record",
"name" : "EventData",
"namespace" : "Microsoft.ServiceBus.Messaging",
"fields" : [ {
"name" : "SequenceNumber",
"type" : "long"
}, {
"name" : "Offset",
"type" : "string"
}, {
"name" : "EnqueuedTimeUtc",
"type" : "string"
}, {
"name" : "SystemProperties",
"type" : {
"type" : "map",
"values" : [ "long", "double", "string", "bytes" ]
}
}, {
"name" : "Properties",
"type" : {
"type" : "map",
"values" : [ "long", "double", "string", "bytes", "null" ]
}
}, {
"name" : "Body",
"type" : [ "null", "bytes" ]
} ]
}
"""
schema = avro.schema.parse(schema_str)
reader = DataFileReader(open("48.avro", "rb"), DatumReader(schema, schema))
for d in reader:
print(d)
reader.close()
But this hasn't helped, still nothing is printed.
While I was expecting that a list of dictionary objects would be printed...
UPDATE:
I've got a reply at the mailing list that avro-python3 is deprecated.
Still my issue with original avro persists, nothing is printed.
UPDATE 2:
I have to apologize - the avro file I was using did not contain any useful data. The reason for my confusion is that a colleague was using a different file with the same name while testing for me.
Now I have tried both avro and fastavro modules on a different avro file and both worked. I will look at PySpark as well.
Upvotes: 0
Views: 525
Reputation: 89
As OneCricketeer suggested use PySpark to read avro files generated by EventHub. Here, PySpark: Deserializing an Avro serialized message contained in an eventhub capture avro file is one such example.
Upvotes: 2