What is the difference between a "stateful" and "stateless" system?

Question

Apache Spark brags that its operators (nodes) are "stateless". This allows Spark's architecture to use simpler protocols for things like recovery, load balancing, and handling stragglers.

On the other hand Apache Flink describes its operators as "stateful", and claim that statefulness is necessary for applications like machine learning. Yet Spark programs are able to pass forward information and maintain application data in RDDs without maintaining "state".

What is happening here? Is Spark not a true stateless system? Or is Flink's assertion that statefulness is essential for machine learning and similar application incorrect? Or is there some additional nuance here?

I don't feel like I truly grok the difference between "stateful" and "stateless" systems, and I would appreciate if they could be explained.

What is the difference between a "stateful" and "stateless" system?

Answers (1)

Related Questions

What is the difference between a &quot;stateful&quot; and &quot;stateless&quot; system?

Answers (1)

Related Questions

What is the difference between a "stateful" and "stateless" system?