Reputation: 65
I am writing a spark streaming app with online streaming data compared to basic data which i broadcast into each computing node. However, since the basic data is updated daily, i need to update the broadcasted variable daily too. The basic data resides on hdfs.
Is there a way to do this? The update is not related to any online streaming results, just say at 12:00 am everyday. Moreover, if there is such a way, will the updating process block spark streaming computing jobs?
Upvotes: 1
Views: 1620
Reputation: 186
Refer to the last answer in the thread you referred. Summary - instead of sending the data, send the caching code to update data at the needed interval
Upvotes: 3