Reputation: 3608
I have implemented Spark Streaming which receives data from Kafka. There is a RDD which loads data from Database to perform operations against incoming data from streaming. However, I want to refresh the RDD periodically to retrieve any changes in the Data source (Database). Is there any way to refresh / reload the data?
Upvotes: 0
Views: 292
Reputation: 2472
If you perform the reading of the database in a transform, you can also pass the time as an argument:
.transform((rdd, time) => refreshDbTable(rdd, time))
Then if you want to refresh every 15mins;
def refreshDbTable(rdd: RDD, time: Time) = {
if (time.isMultipleOf(Minutes(15))) {
// drop the temp table
// re-register the temp table
}
}
Upvotes: 2
Reputation: 314
you can broadcast the rdd, and use a timer to periodic update the broadcast.
Upvotes: 0