Ravi
Ravi

Reputation: 173

updatestatebykey - Pyspark - Spark streaming

I am new to spark streaming. Trying to understand the importance of UpdateStateByKey operation? what is the use of it? What is the necessity to store arbitary state? How it works?

Upvotes: 1

Views: 1036

Answers (1)

Mariusz
Mariusz

Reputation: 13946

The updateStateByKey method allows you to create state information based on data coming from the stream.

For example - if you have a weather sensors that are sending current status (like wind speed, temperature) for a given sensor_id in format (sensor_id, (timestamp, values)), you can use updateStateByKey to build a stream that represents current weather state across sensors, like [(sensor_1, current_weather_data), (sensor_2, current_weather_data)].

Then you can join the stream with other data and even if sensor didn't send its information in last window, state will still contain last value. I used this method in this notebook.

Upvotes: 2

Related Questions