Reputation: 4782
I am a bit confused about some of the Dataflow pricing around streaming:
I have a pipeline where at the very end , I am trying to load data into BigQuery
using the FILE_LOADS
method, but with a triggering_frequency
set, however that seems to demand that the pipeline has to be a streaming pipeline. This is the only reason I need to set the pipeline as streaming. Everything else is perfectly batch, and the data source of the pipeline is also bounded (another BigQuery table).
Now if I enabled --streaming
, what would be the effect of the pricing on this pipeline? Looking at the pricing link, it says the following are billed:
The volume of data ingested into your streaming pipeline
The complexity of the pipeline
The number of pipeline stages with shuffle operation or with stateful DoFns
Now, my question is will all these also apply to the previous steps/DoFns in my pipeline even though those are working on bounded data?
Upvotes: 0
Views: 142
Reputation: 1383
Yes, they will apply to the whole pipeline.
Your cost should still be relatively the same since your volume of data and pipeline haven't changed. The triggering_frequency merely changes how often a load job is triggered.
Why do you need to set this frequency though? Does the default behavior not work for your batch job? I'm not sure how the pipeline will terminate in this setup. Will you have to cancel it once everything is processed?
Upvotes: 1