Sam
Sam

Reputation: 766

Cloud Dataflow: How to use Google-provided Templates for PubSub to BigQuery

I am using PubSub to capture realtime data. Then using GCP Dataflow to stream the data into BigQuery. I am using Java for dataflow.

I want to try out the templates given in DataFlow. The process is: PubSub --> DataFlow --> BigQuery

Currently I am sending message in string format into PubSub (Using Python here). But the template in dataflow is only accepting JSON message. The python library is not allowing me to publish a JSON message. Can anyone suggest me a way publish a JSON message to PubSub so that I can use the dataflow template to do the Job.

Upvotes: 0

Views: 1038

Answers (1)

Zhou Yunqing
Zhou Yunqing

Reputation: 444

The pipeline pumping data from PubSub to BQ provided by Google now assume JSON format and a matching schema on the other side.

Publishing JSONs to Pubsub is no different from publishing strings. You can try the following code snippets for python dict to JSON conversion:

import json
py_dict = {"name" : "Peter", "locale" : "en-US"}
json_string = json.dumps(py_dict)

If you'd like to do heavy customization to the pipeline, you can also take the source code at the following location and build your own.

https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java

Upvotes: 2

Related Questions