Reputation: 169
df = spark.read.format('csv').load('...')
It is my understanding that , load is a transformation and executes only when an action is called. However, while the load statement is being executed, it appears to be an action under the Spark UI.
Edit:
From the comments/answers , i inferred that load may or may not be a transformation but not definitely an action which is great and understandable.
If it is not an action why it is creating a DAG? It creates a DAG just for a load statement not just WholeStageCodegen(which is in SQL tab). Please see the below image: Screenshot
Upvotes: 4
Views: 4150
Reputation: 1642
Load is neither action nor transformation it is a method of class DataFrameReader that describes how to load data from an external data source.
All methods of DataFrameReader merely describe a process of loading a data and do not trigger a Spark job (until an action is called).
This is mentioned by jaceklaskowski Please read https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-DataFrameReader.html#methods
you can also refer the transformation and action API list from the databricks here https://training.databricks.com/visualapi.pdf load is not mentioned anywhere as a transformation or action
Upvotes: 0
Reputation: 18043
Specifically, based on your comments:
Load does nothing. It is just part of the sqlContext.read or spark.read.format API as a parameter, that can be set indirectly or directly on the read. read allows data formats to be specified.
The DF or underlying RDD is evaluated lazily as they say.
Upvotes: 1