Bogdan Vakulenko
Bogdan Vakulenko

Reputation: 3390

Spark structured streaming stateless mode

Is it possible to use Spark structured streaming aggregations without keeping state? For example if I what to count words on every batch only without taking into account previous batches. I know there are some functions like flatMapGroups and mapGroups that allows doing things like that, but it doesn't seems to be native approach and it has drawbacks.

What is canonical way of doing this in spark? Should I use DStream instead?

Upvotes: 1

Views: 444

Answers (1)

Akhil Bojedla
Akhil Bojedla

Reputation: 2218

Spark structured streaming is not for you in this case. Use DStreams instead. However as a workaround you can use (flat)mapGroupsWithState and set the timeOutConf to be GroupStateTimeout.ProcessingTimeTimeout(). Then set the timeout of "0 seconds" on state so it gets evicted eventually.

Upvotes: 2

Related Questions