syv
syv

Reputation: 3608

Refreshing RDD in Spark Streaming

I have implemented Spark Streaming which receives data from Kafka. There is a RDD which loads data from Database to perform operations against incoming data from streaming. However, I want to refresh the RDD periodically to retrieve any changes in the Data source (Database). Is there any way to refresh / reload the data?

Upvotes: 0

Views: 292

Answers (2)

bp2010
bp2010

Reputation: 2472

If you perform the reading of the database in a transform, you can also pass the time as an argument:

.transform((rdd, time) => refreshDbTable(rdd, time))

Then if you want to refresh every 15mins;

def refreshDbTable(rdd: RDD, time: Time) = {

  if (time.isMultipleOf(Minutes(15))) {

    // drop the temp table

    // re-register the temp table 
  }

}

Upvotes: 2

Gaurav
Gaurav

Reputation: 314

you can broadcast the rdd, and use a timer to periodic update the broadcast.

Upvotes: 0

Related Questions