user8482601
user8482601

Reputation: 87

What is the diffence between different read options in spark?

I am reading a csv file through following code:-

    from pyspark.sql import SparkSession
    spark = SparkSession.builder \
            .master("local[2]") \
            .getOrCreate()

Now there are four different options to read:

  1. df = spark.read.load("/..../xyz.csv")
  2. df = spark.read.csv("/..../xyz.csv")
  3. df = spark.read.format('csv').load("/..../xyz.csv")
  4. df = spark.read.option().csv("/..../xyz.csv")

Which option should I use ?

EDIT:-

Also, both inferSchema="true" and inferSchema=True are working. Can we blindly use any one?

Upvotes: 5

Views: 5522

Answers (2)

OneCricketeer
OneCricketeer

Reputation: 191743

2 and 3 are equivalent.

3 allows for an additional option(key, value) function (see 4, or spark.read.format('csv').option(...).load()) that could allow you to skip a header row, or set a delimiter other than comma, for example.

def load(self, path=None, format=None, schema=None, **options):
        """Loads data from a data source and returns it as a :class`DataFrame`.

        :param path: optional string or a list of string for file-system backed data sources.
        :param format: optional string for format of the data source. Default to 'parquet'.
        :param schema: optional :class:`pyspark.sql.types.StructType` for the input schema
                       or a DDL-formatted string (For example ``col0 INT, col1 DOUBLE``).
        :param options: all other string options 

1 does not parse CSV, it uses Parquet as the default format.

I would suggest inferSchema=True to prevent typos in the string value

Upvotes: 8

M. Alexandru
M. Alexandru

Reputation: 624

2 is an allias for 3. 1 reads by default parquet files.

For example: spark.read.csv() just calls .format("csv").load("path")

  @scala.annotation.varargs
  def csv(paths: String*): DataFrame = format("csv").load(paths : _*)

It doesn't matter which one you are using.(2,3,4) As I said 1 read parquet by default.

Upvotes: 5

Related Questions