teoreda
teoreda

Reputation: 2570

Python - schema-less Apache Avro data serialization

I'm trying to exchanging serialized messages through a kafka broker using python 2.7 and Apache Avro(python client). I would like to know if there is a way for exchanging messages without creating a schema before.

This is the code (using a schema, sensor.avsc, the thing that i want to avoid):

from kafka import SimpleProducer, KafkaClient
import avro.schema
import io, random
from avro.io import DatumWriter

# To send messages synchronously
kafka = KafkaClient('localhost:9092')
producer = SimpleProducer(kafka, async = False)

# Kafka topic
topic = "sensor_network_01"

# Path to user.avsc avro schema that i don't want
schema_path="sensor.avsc"
schema = avro.schema.parse(open(schema_path).read())


for i in xrange(100):
    writer = avro.io.DatumWriter(schema)
    bytes_writer = io.BytesIO()
    encoder = avro.io.BinaryEncoder(bytes_writer)
    # creation of random data
    writer.write({"sensor_network_name": "Sensor_1", "value": random.randint(0,10), "threshold_value":10 }, encoder)

    raw_bytes = bytes_writer.getvalue()
    producer.send_messages(topic, raw_bytes)

This is the sensor.avsc file:

{
    "namespace": "sensors.avro",
    "type": "record",
    "name": "Sensor",
    "fields": [
        {"name": "sensor_network_name", "type": "string"},
        {"name": "value",  "type": ["int", "null"]},
        {"name": "threshold_value", "type": ["int", "null"]}
    ]
}

Upvotes: 1

Views: 2722

Answers (2)

tonicebrian
tonicebrian

Reputation: 4795

This code:

import avro.schema
import io, random
from avro.io import DatumWriter, DatumReader
import avro.io

# Path to user.avsc avro schema
schema_path="user.avsc"
schema = avro.schema.Parse(open(schema_path).read())


for i in xrange(1):
    writer = avro.io.DatumWriter(schema)
    bytes_writer = io.BytesIO()
    encoder = avro.io.BinaryEncoder(bytes_writer)
    writer.write({"name": "123", "favorite_color": "111", "favorite_number": random.randint(0,10)}, encoder)
    raw_bytes = bytes_writer.getvalue()

    print(raw_bytes)

    bytes_reader = io.BytesIO(raw_bytes)
    decoder = avro.io.BinaryDecoder(bytes_reader)
    reader = avro.io.DatumReader(schema)
    user1 = reader.read(decoder)
    print(" USER = {}".format(user1))

for dealing with this schema

{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

is what you need.

Credit goes to this gist

Upvotes: 4

Austin Wilkins
Austin Wilkins

Reputation: 29

I haven't seen anyone do this, but have wanted it myself. You might have to write it yourself, but it shouldn't be too bad - assuming the object to serialize is simple; all you'd have to do is loop through the fields and have a map from python types to avro types. Nested fields will require something like recursion to dig into each object.

Upvotes: 0

Related Questions