bhdrozgn
bhdrozgn

Reputation: 197

Deserialization, fixed data type in Avro

I am new in avro and I have a avro file to deserialize. Some schemas use fixed type of data to store MAC addresses. Below schema is one of those schemas and used in different schemas as a type.

The schema for MAC addresses like below:

{
    "type": "fixed",
    "name": "MacAddress",
    "size": 6
}

I wrote the first record of the data to a text file using:

from avro.datafile import DataFileReader
from avro.io import DatumReader

reader = DataFileReader(open("data.avro", "rb"), DatumReader())
count = 0
for record in reader:
    if count == 0:
        with open('first_record.txt', 'w') as first_record:
            first_record.write(str(record))
    elif count > 0: break
    count = count + 1
reader.close()

The above mentioned MAC addresses appears in the deserialized data like:

"MacAddress":"b""\\x36\\xe9\\xad\\x64\\x2d\\x3d",

I know that \x means the following is a hexadecimal value. So this is suppose to be "36:e9:ad:64:2d:3d", right? Are "b""" style values the expected output for fixed types?

Also, some values are like below:

"Addr":"b""j\\x26\\xb7\\xda\\x1d\\xf6"

"Addr":"b""\\x28\\xcb\\xc5v\\x14%" 

How come these are MAC addresses? What does j, % characters means?

Upvotes: 1

Views: 760

Answers (1)

Scott
Scott

Reputation: 2074

Are "b""" style values the expected output for fixed types?

Yes, since fixed types represent bytes and on Python a string of bytes is represented with a prepended b before thing string. It looks like you have a lot of extra quotes in there and I'm guessing that's because you are doing things like str(record) which is probably causing the extra backslashes and quote characters. For example:


>>> str(b"\xae")
"b'\\xae'"

How come these are MAC addresses? What does j, % characters means?

Are you sure these are the same record type? The key is Addr instead of MacAddress so it seems like it might be a different record type and schema.

Upvotes: 2

Related Questions