mattweyant
mattweyant

Reputation: 1058

DStream updateStateByKey update function implementation

After reading through the documentation, the source and the examples, I am trying to understand the different method signatures of updateStateByKey and when using one would be more appropriate than another.

Specifically, I don't understand the following API:

def updateStateByKey[S: ClassTag](
  updateFunc: (Iterator[(K, Seq[V], Option[S])]) => Iterator[(K, S)],
  ...
)

In what situations would I create an updateFunc that takes and returns an Iterator rather than implementing (Seq[V], Option[S]) => Option[S] function?

Upvotes: 2

Views: 403

Answers (1)

maasg
maasg

Reputation: 37435

While (Seq[V], Option[S]) => Option[S] will let you "see" one only the previous (if any) and current values for a key, you don't get access to the key itself.

Using (Iterator[(K, Seq[V], Option[S])]) => Iterator[(K, S)] you can make decisions based on the key as well, like "have I seen this key", "have I seen all these keys", compare keys with values in your decision logic or preserve just a sub-set of the keys (e.g. "top-k").

Upvotes: 2

Related Questions