Soundsoul
Soundsoul

Reputation: 65

spark streaming broadcast variable daily update

I am writing a spark streaming app with online streaming data compared to basic data which i broadcast into each computing node. However, since the basic data is updated daily, i need to update the broadcasted variable daily too. The basic data resides on hdfs.

Is there a way to do this? The update is not related to any online streaming results, just say at 12:00 am everyday. Moreover, if there is such a way, will the updating process block spark streaming computing jobs?

Upvotes: 1

Views: 1620

Answers (1)

Ravi Reddy
Ravi Reddy

Reputation: 186

Refer to the last answer in the thread you referred. Summary - instead of sending the data, send the caching code to update data at the needed interval

  1. Create CacheLookup object that updates daily@12 am
  2. Wrap that in Broadcast variable
  3. Use CacheLookup as part of streaming logic

Upvotes: 3

Related Questions