bsmarcosj
bsmarcosj

Reputation: 1720

Is it possible to reduce the throughput of my pipeline?

I have a dataflow job that communicates with external resources. The problem is that theses external resources are slower than the dataflow job and this causes that the external resources are always saturated. I need some form to reduce the quantity of messages read from PubSub or something to reduce the throughput of the job in order to reduce the traffic to the external resources.

Thanks.

Upvotes: 1

Views: 318

Answers (1)

jkff
jkff

Reputation: 17913

We currently do not support throttling primitives (such as "make sure this DoFn is throttled to at most X calls per second over the whole job"), however we know it is an important use case and it will most likely be supported sooner or later.

Meanwhile your best bet is, as Ryan said, to limit the number of workers and worker threads: specify --numWorkers (or --maxNumWorkers if you are using autoscaling) and --numberOfWorkerHarnessThreads. However, note that this will lead to creating a backlog of input messages, rather than dropping them. It is hard to tell which is better in your use case.

Upvotes: 2

Related Questions