Saransh
Saransh

Reputation: 147

BigQuery Table a Pub Sub Topic not working in Apache Beam Python SDK? Static source to Streaming Sink

My basic requirement was to create a pipeline to read from BigQuery Table and then convert it into JSON and pass it onto a PubSub topic.


At first I read from Big Query and tried to write it into Pub Sub Topic but got an exception error saying "Pub Sub" is not supported for batch pipelines. So I tried some workarounds and

I was able to work around this in python by

p = beam.Pipeline(options=options)

json_string_output =   (
                          p
                          | 'Read from BQ' >> beam.io.ReadFromBigQuery(
                                query='SELECT * FROM '\
                                 '`project.dataset.table_name`',
                                 use_standard_sql=True)
                          | 'convert to json' >> beam.Map(lambda record: json.dumps(record))
                          | 'Write results' >> beam.io.WriteToText(outputs_prefix)
                      )

p.run()
 # create publisher
    publisher = pubsub_v1.PublisherClient()

    with open(input_file, 'rb') as ifp:
        header = ifp.readline()  
        # loop over each record
        for line in ifp:
            event_data = line   # entire line of input file is the message
            print('Publishing {0} to {1}'.format(event_data, pubsub_topic))
            publisher.publish(pubsub_topic, event_data)

Python working codes repo

I'm not able to find a way to integrate both scripts within a single ApacheBeam Pipeline.

Upvotes: 0

Views: 321

Answers (1)

Kenn Knowles
Kenn Knowles

Reputation: 6023

Because your pipeline does not have any unbounded PCollections, it will be automatically run in batch mode. You can force a pipeline to run in streaming mode with the --streaming command line flag.

Upvotes: 1

Related Questions