Reputation: 1058
After reading through the documentation, the source and the examples, I am trying to understand the different method signatures of updateStateByKey and when using one would be more appropriate than another.
Specifically, I don't understand the following API:
def updateStateByKey[S: ClassTag](
updateFunc: (Iterator[(K, Seq[V], Option[S])]) => Iterator[(K, S)],
...
)
In what situations would I create an updateFunc
that takes and returns an Iterator
rather than implementing (Seq[V], Option[S]) => Option[S]
function?
Upvotes: 2
Views: 403
Reputation: 37435
While (Seq[V], Option[S]) => Option[S]
will let you "see" one only the previous (if any) and current values for a key, you don't get access to the key itself.
Using (Iterator[(K, Seq[V], Option[S])]) => Iterator[(K, S)]
you can make decisions based on the key as well, like "have I seen this key", "have I seen all these keys", compare keys with values in your decision logic or preserve just a sub-set of the keys (e.g. "top-k").
Upvotes: 2