j raj
j raj

Reputation: 169

Is 'load' command in spark an action or transformation?

df = spark.read.format('csv').load('...')

It is my understanding that , load is a transformation and executes only when an action is called. However, while the load statement is being executed, it appears to be an action under the Spark UI.

Edit:

From the comments/answers , i inferred that load may or may not be a transformation but not definitely an action which is great and understandable.

If it is not an action why it is creating a DAG? It creates a DAG just for a load statement not just WholeStageCodegen(which is in SQL tab). Please see the below image: Screenshot

Upvotes: 4

Views: 4150

Answers (2)

Strick
Strick

Reputation: 1642

Load is neither action nor transformation it is a method of class DataFrameReader that describes how to load data from an external data source.

All methods of DataFrameReader merely describe a process of loading a data and do not trigger a Spark job (until an action is called).

This is mentioned by jaceklaskowski Please read https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-DataFrameReader.html#methods

you can also refer the transformation and action API list from the databricks here https://training.databricks.com/visualapi.pdf load is not mentioned anywhere as a transformation or action

Upvotes: 0

Ged
Ged

Reputation: 18043

Specifically, based on your comments:

Load does nothing. It is just part of the sqlContext.read or spark.read.format API as a parameter, that can be set indirectly or directly on the read. read allows data formats to be specified.

The DF or underlying RDD is evaluated lazily as they say.

Upvotes: 1

Related Questions