user699848
user699848

Reputation: 358

Apache Flink limit data being processed

I am looking for an option wherein I can limit the amount of data that is being currently processed.

Use case: I am reading from a Kafka data stream and processing that data, and I want to limit the number of messages that are in-flight. The reason for doing this is the throughput of the third-party application. Generally it's not an issue, but in the scenarios of backpressure, there are frequent failures and application restarts because of these failures.

Upvotes: 1

Views: 1551

Answers (1)

David Anderson
David Anderson

Reputation: 43499

Some tools available for this are:

  1. Limit the parallelism.
  2. Use Flink's async i/o operator to handle the connection to the 3rd party API, and set its capacity (number of in-flight requests) accordingly. This will ultimately backpressure the sources.
  3. Apply rate-limiting to the sources. See https://stackoverflow.com/a/65232295/2000823 and https://stackoverflow.com/a/59027848/2000823 for more on that topic.

Upvotes: 1

Related Questions