Reputation: 43
I am a beginner to Apache Spark. I was trying to understand the concept of DAG which Apache Spark creates and when we apply transformations one after another and which gets executed once an action is performed.
What I could make out is that in the event of a job failure, DAG comes to the rescue. Since all the intermediate RDDs are stored in the memory, Spark knows till which step the job ran successfully and restart the job from that point only, instead of starting the job from the beginning.
Now I have several questions here:
Upvotes: 2
Views: 1119
Reputation: 2998
I think what you have said above based on your understanding is not fully correct.
Your Question 1: Can DAG make Spark resilient to node failures ?
Yes DAG makes it fault tolerance to node failures.
Question 2: Is it the driver node which maintains the DAG ?
Yes. When an action is called, the created DAG is submitted to DAG Scheduler, where it gets converted into stages of jobs.
Question 3: Can there be multiple DAGs for a single execution ?
No. you cannot have multiple DAGs because DAG is kind of a graph that represents the operation that you perform.
Upvotes: 3