user1965449
user1965449

Reputation: 2931

How to ingest data from a GCS bucket via Dataflow as soon as a new file is put into it?

I have a use case where I need to input data from google Cloud Storage bucket as soon as its made available in the form of a new file in a storage bucket via Dataflow .

How do I trigger the execution of the Dataflow job as soon as the new data(file) becomes available or added to the storage bucket ?

Upvotes: 1

Views: 1913

Answers (1)

Graham Polley
Graham Polley

Reputation: 14781

If your pipelines are written in Java, then you can use Cloud Functions and Dataflow templating.

I'm going to assume you're using 1.x SDK (it's also possible with 2.x)

  1. Write your Pipeline and specify the "TemplatingDataflowPipelineRunner" as the runner
  2. Write a Cloud Function that is set up to listen and react to new objects (in this case CSV files) that arrive into your bucket.
  3. The Cloud Function kicks off the Dataflow pipeline, and passes the name of the new file as a parameter to it.

See here for a walkthrough on how to build this pipeline. Full disclosure: I work for Shine.

Upvotes: 2

Related Questions