Code Geass
Code Geass

Reputation: 291

How to use existing PubSub Subscription with Google-Provided PubSub to BigQuery Dataflow Template

I am trying to setup a Dataflow job using the google provided template PubSub to BigQuery. I see an option to specify the Cloud Pub/Sub input topic but I don't see any option to specify Pub/Sub input subscription in GCP console UI.

If I provide the topic, job would automatically create a subscription to read the messages from the provided topic. Problem with this is, the job will see only messages published to the topic after the Dataflow job has started. Anything published before to the same topic would be ignored.

I don't have any complex transformations to do in my job. So the google provided template would work for me out of the box. But the lack of ability to specify my own subscription is bothering me. I don't want to setup a custom job pipeline just for this reason. Anybody know if there is a workaround for this?

Upvotes: 4

Views: 515

Answers (2)

blasteye
blasteye

Reputation: 68

That's not currently supported. However, it's a great use case and is on the Google Cloud Team's radar.

Upvotes: 4

slve
slve

Reputation: 11

As an update, now there's a separate PubSub Subscription to BigQuery.

https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#pubsub-subscription-to-bigquery

gcloud dataflow jobs run $jobname \
  --project=$project \
  --disable-public-ips \
  --gcs-location gs://dataflow-templates-$location/latest/PubSub_Subscription_to_BigQuery \
  --worker-machine-type n1-standard-1 \
  --region $location \
  --staging-location gs://$bucket/pss-to-bq \
  --parameters inputSubscription=projects/$project/subscriptions/$subscription,outputTableSpec=$dataset.$table

Upvotes: 1

Related Questions