Thierry Falvo
Thierry Falvo

Reputation: 6290

Can Cloud Dataflow streaming job scale to zero?

I'm using Cloud Dataflow streaming pipelines to insert events received from Pub/Sub into a BigQuery dataset. I need a few ones to keep each job simple and easy to maintain.

My concern is about the global cost. Volume of data is not very high. And during a few periods of the day, there isn't any data (any message on pub/sub).

I would like that Dataflow scale to 0 worker, until a new message is received. But it seems that minimum worker is 1.

So minimum price for each job for a day would be : 24 vCPU Hour... so at least $50 a month/job. (without discount for monthly usage)

I plan to run and drain my jobs via api a few times per day to avoid 1 full time worker. But this does not seem to be the right form for a managed service like DataFlow.

Is there something I missed?

Upvotes: 5

Views: 1982

Answers (2)

Héctor Neri
Héctor Neri

Reputation: 1452

Dataflow can't scale to 0 workers, but your alternatives would be to use Cron, or Cloud Functions to create a Dataflow streaming job whenever an event triggers it, and for stopping the Dataflow job by itself, you can read the answers to this question.

You can find an example here for both cases (Cron and Cloud Functions), note that Cloud Functions is not in Alpha release anymore and since July it's in General Availability release.

Upvotes: 4

Ryan McDowell
Ryan McDowell

Reputation: 853

A streaming Dataflow job must always have a single worker. If the volume of data is very low, perhaps batch jobs fit the use case better. Using a scheduler or cron you can periodically start a batch job to drain the topic and this will save on cost.

Upvotes: 0

Related Questions