Reputation: 1720
I have a dataflow job that communicates with external resources. The problem is that theses external resources are slower than the dataflow job and this causes that the external resources are always saturated. I need some form to reduce the quantity of messages read from PubSub or something to reduce the throughput of the job in order to reduce the traffic to the external resources.
Thanks.
Upvotes: 1
Views: 318
Reputation: 17913
We currently do not support throttling primitives (such as "make sure this DoFn is throttled to at most X calls per second over the whole job"), however we know it is an important use case and it will most likely be supported sooner or later.
Meanwhile your best bet is, as Ryan said, to limit the number of workers and worker threads: specify --numWorkers
(or --maxNumWorkers
if you are using autoscaling) and --numberOfWorkerHarnessThreads
. However, note that this will lead to creating a backlog of input messages, rather than dropping them. It is hard to tell which is better in your use case.
Upvotes: 2