Jitu
Jitu

Reputation: 91

Read TSV file in pyspark

What is the best way to read .tsv file with header in pyspark and store it in a spark data frame.

I am trying to use "spark.read.options" and "spark.read.csv" commands however no luck.

Thanks.

Regards, Jit

Upvotes: 7

Views: 18419

Answers (2)

vishnu_analytics
vishnu_analytics

Reputation: 1

Please try this – it worked for me:

#Read tsv file 
df = spark.read\
    .format("csv")\
        .options(inferSchema = "True",header = "True", sep = "\t" )\
            .load(path)
display(df)

Upvotes: 0

Shubham Jain
Shubham Jain

Reputation: 5526

Well you can directly read the tsv file without providing external schema if there is header available as:

df = spark.read.csv(path, sep=r'\t', header=True).select('col1','col2')

Since spark is lazily evaluated it'll read only selected columns. Hope it helps.

Upvotes: 12

Related Questions