Apache beam read schema from pubsub

Question

I am very new to beam and reading streaming data so I hope my question is not too trivial.

I am using the beam Python SDK to read data from PubSub before writing it into some other files. As the data I receive is always in the same format, I tried to make use of the schema feature to parse the data I receive from PubSub.

The data received is always a dictionary name: "my_name", value: 42, so my pipeline looks like this:

import typing

import apache_beam as beam
from apache_beam.io import ReadFromPubSub


class MySchema(typing.NamedTuple):
    name: str
    value: int

with beam.Pipeline() as pipeline:
    pipeline | ReadFromPubSub(topic=).with_output_types(MySchema)

However, I then get the error apache_beam.typehints.decorators.TypeCheckError: Output type hint violation at ReadFromPubSub: expected , got

It makes sense as PubSub naturally gets bytes: I can just parse the data into a dictionary and then it seems to work.

with beam.Pipeline() as pipeline:
    (pipeline 
        | ReadFromPubSub(topic=)
        | beam.Map(lambda x: json.loads(x.decode("utf8"))).with_output_types(MySchema)

It seems to work fine, but does not having to parse the data into a dictionary kind of defeat the purpose of the schema ? Is there any more straightforward way of doing this ?

Apache beam read schema from pubsub

Answers (1)

Related Questions