Asad
Asad

Reputation: 110

Pub/Sub csv data to Dataflow to BigQuery

My pipeline is IoTCore -> Pub/Sub -> Dataflow -> BigQuery. Initially the data I was getting was Json format and the pipeline was working properly. Now I need to shift to csv and the issue is the Google defined dataflow template I was using uses Json input instead of csv. Is there an easy way of transfering csv data from pub/sub to bigquery through dataflow. The template can probably be changed but it is implemented in Java which I have never used so would take a long time to implement. I also considered implementing an entire custom template in python but that would take too long. Here is a link to the template provided by google: https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java

Sample: Currently my pub/sub messages are JSON and these work correctly

"{"Id":"123","Temperature":"50","Charge":"90"}"

But I need to change this to comma seperated values

"123,50,90"

Upvotes: 0

Views: 710

Answers (2)

Vibhor Gupta
Vibhor Gupta

Reputation: 699

Can you please share your existing python code where you are parsing JSON format data and new & old data sample, So that it can be customized accordingly.

Moreover you can refer Python code here, it has performed word count transformation logic over PCollection, hopefully it can give you some refence to customize your code accrdingly.

Upvotes: 1

guillaume blaquiere
guillaume blaquiere

Reputation: 75715

Very easy: Do nothing!! If you have a look to this line you can see that the type of the messages used is the PubSub message JSON, not your content in JSON.

So, to prevent any issues (to query and to insert), write in another table and it should work nicely!

Upvotes: 1

Related Questions