Suresh Parmar
Suresh Parmar

Reputation: 823

How to reuse spark RDD after stopping the Context

I have created an RDD, Below is my Program.

public static void main(String[] args) throws JSONException, IOException, InterruptedException {

         SparkConf conf1 = new SparkConf().setAppName("SparkAutomation").setMaster("local");

         app.run(conf1);

}


private void run(SparkConf conf) throws JSONException, IOException,    InterruptedException {
JavaSparkContext sc = new JavaSparkContext(conf);

getDataFrom(sc);
sc.stop();   

}

 void getDataFrom(JavaSparkContext sc) throws JSONException, IOException, InterruptedException {

JavaRDD<String> Data = sc.textFile("/path/to/File");

}

I want to use the RDD created above in the other part of the application , I have to stop the context , I need to create another Context , and will use the above RDD there. My question is will I be able to use the RDD if I persist that to Memory ?

  Data.persist(StorageLevel.MEMORY_ONLY());

Or do I have to persist that to Disk.

  Data.persist(StorageLevel.DISK_ONLY());

Upvotes: 2

Views: 1323

Answers (1)

Hamel Kothari
Hamel Kothari

Reputation: 737

You won't be able to reuse that RDD in either situation if you need to restart your Spark Context. Things persisted with RDD.persist are not accessible outside of your Spark Context. Each RDD is specifically tied to an individual Spark Context.

If you want to stop the context and start a new one consider persisting to an underlying datastore using something like RDD.saveAsTextFile("/saved/rdd/path") and then reading a new RDD in the new Spark Context using sc.textFile("/saved/rdd/path").

Upvotes: 5

Related Questions