Pango
Pango

Reputation: 83

How does Apache Flink implement iteration?

DAG (directed acyclic graph) execution of big data is common. I am wondering how Apache Flink implements iterations given that it's possible that the graph can be cyclic.

Upvotes: 7

Views: 1680

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62330

If Flink executes iterative programs, the dataflow graph is not a DAG but allows for cycles. However, this cycles are not arbitrary and must follow a certain pattern to allow Flink to control this cyclic flow to some extent.

There is often no strict technical reason in other systems for not supporting cycles. Allowing for cycles in a generic way is usually prohibited because it might result in an infinite loop (ie, that a tuple spins the cycle forever and the program does not terminate).

Flink tracks the cycle by counting the number of iterations. This way, Flink can track which tuples belong to which iterations and can, for example, avoid tuples from a new iteration "taking over" tuples from an older one. Furthermore, it allows Flink to detect if the result of iteration n and n+1 are equal or not. An equal result indicates a finished computation allowing Flink to break the infinite loop and terminate (this holds for so-called fix-point iterations).

For a detailed read look at this research paper: https://dl.acm.org/citation.cfm?id=2350245

The usage of iteration in your program is described here: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html#iteration-operators

Upvotes: 6

Related Questions