Structured Streaming extract most recent values for each id

Question

I have datastream containing ID, type, and value: For a group of users with given ID I receive measurements (values) from different sensors (type). Example of incoming data:

ID type value
1  A    70
2  B    16
1  A    71
2  A    72

I need to create Spark Structured Streaming app that will perform custom clustering of the obtained data. However, I am stuck at the begining> I don't know how to create a set of data that will contain the last measurements for each user for each type. I need to have this set for every user that has ever appeared in the system.

So, basically, for a data stream described above, I need a Structured Streaming app that will give me a set of last measurements for every user for every type>

  ID type value
  1  A    71
  2  B    16
  2  A    72

Users may be inactive for some time, I still need to keep their record. It would be useful if the output is a dataframe.

Any ideas for how to do this will be very welcome.

PS I am fairly new to Spark Structured Streaming, sorry if this is a trivial question.

Structured Streaming extract most recent values for each id

Answers (1)

Related Questions