Reputation: 3390
Is it possible to use Spark structured streaming aggregations without keeping state? For example if I what to count words on every batch only without taking into account previous batches. I know there are some functions like flatMapGroups and mapGroups that allows doing things like that, but it doesn't seems to be native approach and it has drawbacks.
What is canonical way of doing this in spark? Should I use DStream instead?
Upvotes: 1
Views: 444
Reputation: 2218
Spark structured streaming is not for you in this case. Use DStreams instead. However as a workaround you can use (flat)mapGroupsWithState
and set the timeOutConf
to be GroupStateTimeout.ProcessingTimeTimeout()
. Then set the timeout of "0 seconds" on state so it gets evicted eventually.
Upvotes: 2