Reputation: 147
My basic requirement was to create a pipeline to read from BigQuery Table and then convert it into JSON and pass it onto a PubSub topic.
At first I read from Big Query and tried to write it into Pub Sub Topic but got an exception error saying "Pub Sub" is not supported for batch pipelines
. So I tried some workarounds and
I was able to work around this in python by
p = beam.Pipeline(options=options)
json_string_output = (
p
| 'Read from BQ' >> beam.io.ReadFromBigQuery(
query='SELECT * FROM '\
'`project.dataset.table_name`',
use_standard_sql=True)
| 'convert to json' >> beam.Map(lambda record: json.dumps(record))
| 'Write results' >> beam.io.WriteToText(outputs_prefix)
)
p.run()
# create publisher
publisher = pubsub_v1.PublisherClient()
with open(input_file, 'rb') as ifp:
header = ifp.readline()
# loop over each record
for line in ifp:
event_data = line # entire line of input file is the message
print('Publishing {0} to {1}'.format(event_data, pubsub_topic))
publisher.publish(pubsub_topic, event_data)
I'm not able to find a way to integrate both scripts within a single ApacheBeam Pipeline.
Upvotes: 0
Views: 321
Reputation: 6023
Because your pipeline does not have any unbounded PCollections, it will be automatically run in batch mode. You can force a pipeline to run in streaming mode with the --streaming
command line flag.
Upvotes: 1