Read TSV file in pyspark

Question

What is the best way to read .tsv file with header in pyspark and store it in a spark data frame.

I am trying to use "spark.read.options" and "spark.read.csv" commands however no luck.

Thanks.

Regards, Jit

Shubham Jain · Accepted Answer

Well you can directly read the tsv file without providing external schema if there is header available as:

df = spark.read.csv(path, sep=r'	', header=True).select('col1','col2')

Since spark is lazily evaluated it'll read only selected columns. Hope it helps.

Answers (2)