Reputation: 363
if spark streaming gets 50 lines of message in a batch interval of 10 seconds, and after 40.5 lines of the message the 10 seconds is up, and the rest falls into an other 10 second interval, the first 40.5 lines of text is one RDD is processed first , first 40 lines in my use case make sense but the next .5 line do not make sense, same is the case with the second RDD first .5 line, is my question even valid ?.Please advice how to handle this ?.
Thanks Bill.
Upvotes: 2
Views: 258
Reputation:
It cannot happen. Either element has been received and is a part of a current window, or it hasn't, and will be included in the next one. File based sources require atomic file creation so situation where only a part of a file is loaded is simply not possible.
Upvotes: 3