Reputation: 173
I am new to spark streaming. Trying to understand the importance of UpdateStateByKey operation? what is the use of it? What is the necessity to store arbitary state? How it works?
Upvotes: 1
Views: 1036
Reputation: 13946
The updateStateByKey
method allows you to create state information based on data coming from the stream.
For example - if you have a weather sensors that are sending current status (like wind speed, temperature) for a given sensor_id in format (sensor_id, (timestamp, values))
, you can use updateStateByKey to build a stream that represents current weather state across sensors, like [(sensor_1, current_weather_data), (sensor_2, current_weather_data)]
.
Then you can join the stream with other data and even if sensor didn't send its information in last window, state will still contain last value. I used this method in this notebook.
Upvotes: 2