Order of Execution in Spark DAG

Question

I want to understand the way a spark Dag is created. Lets suppose I have a Spark driver program which perform 3 spark actions (say writing data on s3).

val df1= spark.read.text("S3://onepath/")
val df2= df1.select(col1,col2)
val df3= spark.read.text("s3://anotherpath/")

df1.write("")
df2.write("")
df3.write("")

I want to understand if spark will always write df1, df2 and df3 in the same order or it can improvise on its own and start writing df1 and df3 in parallel as they are not dependent on each other and then finally write df2 as its dependent on df1.

Order of Execution in Spark DAG

Answers (1)

Related Questions