Alexander Farber
Alexander Farber

Reputation: 22958

How to use avro-python3 on Windows 10 to parse files?

I have downloaded an AVRO file (with JSON payload) from Microsoft Azure to my Windows 10 computer:

Azure Event Hub

Then with python 3.8.5 and avro 1.10.0 installed via pip I have tried running the following script:

import os, avro
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

reader = DataFileReader(open("48.avro", "rb"), DatumReader())
for d in reader:
    print(d)
reader.close()

Unfortunately, nothing is printed by the script.

Then I have searched around and have tried to add a schema as in below:

schema_str = """
{
  "type" : "record",
  "name" : "EventData",
  "namespace" : "Microsoft.ServiceBus.Messaging",
  "fields" : [ {
    "name" : "SequenceNumber",
    "type" : "long"
  }, {
    "name" : "Offset",
    "type" : "string"
  }, {
    "name" : "EnqueuedTimeUtc",
    "type" : "string"
  }, {
    "name" : "SystemProperties",
    "type" : {
      "type" : "map",
      "values" : [ "long", "double", "string", "bytes" ]
    }
  }, {
    "name" : "Properties",
    "type" : {
      "type" : "map",
      "values" : [ "long", "double", "string", "bytes", "null" ]
    }
  }, {
    "name" : "Body",
    "type" : [ "null", "bytes" ]
  } ]
}
"""
schema = avro.schema.parse(schema_str)
reader = DataFileReader(open("48.avro", "rb"), DatumReader(schema, schema))
for d in reader:
    print(d)
reader.close()

But this hasn't helped, still nothing is printed.

While I was expecting that a list of dictionary objects would be printed...

UPDATE:

I've got a reply at the mailing list that avro-python3 is deprecated.

Still my issue with original avro persists, nothing is printed.

UPDATE 2:

I have to apologize - the avro file I was using did not contain any useful data. The reason for my confusion is that a colleague was using a different file with the same name while testing for me.

Now I have tried both avro and fastavro modules on a different avro file and both worked. I will look at PySpark as well.

Upvotes: 0

Views: 525

Answers (1)

user7788539
user7788539

Reputation: 89

As OneCricketeer suggested use PySpark to read avro files generated by EventHub. Here, PySpark: Deserializing an Avro serialized message contained in an eventhub capture avro file is one such example.

Upvotes: 2

Related Questions