Abhinav Rawat
Abhinav Rawat

Reputation: 452

Does Spark separately maintains lineage graph for each RDD created?

I have a doubt regarding the DAG creation during Spark execution. take this code snippet as an example.

public static void main(String[] args) {
           System.setProperty("hadoop.home.dir", "C:\\winutils");
           SparkConf conf = new SparkConf().setAppName("MyFirstProgram").setMaster("local[*]");
           JavaSparkContext sc = new JavaSparkContext(conf);


           JavaRDD<Integer> rdd1 = sc.parallelize(Arrays.asList(1, 2, 3, 4, 5,6,7,8,9,10)); 
           JavaRDD<Integer> rdd2 =  rdd1 .filter(x -> x > 2 && x < 8 ? true : false); 
           JavaRDD<Integer> rdd3 =  rdd2 .map(x -> x % 2 == 0 ? x * x : x * x * x);


           List<Integer> list = rdd3.collect(); 


           for (int i : list) {
                     System.out.println(i);
            }
            sc.close();
   }

does spark creates a separate DAG/lineage graph for each RDD, or maintains a single DAG by keep adding vertices on it as spark encounters a transformation?

in other words for the above program,

Will there be only a single DAG all the rdds? like below- enter image description here

or as shown below, three separate lineage graphs for each rdd1, rdd2, and rdd3? enter image description here

Upvotes: 0

Views: 597

Answers (1)

user10176119
user10176119

Reputation: 26

Each RDD has it's own lineage / DAG. There is no "global" DAG for all transformations in the application.

However, nodes (RDDs) are "shared" between DAGs - RDD1 in all three DAGs refers to the same object.

Upvotes: 1

Related Questions