Reputation: 91
What is the best way to read .tsv file with header in pyspark and store it in a spark data frame.
I am trying to use "spark.read.options" and "spark.read.csv" commands however no luck.
Thanks.
Regards, Jit
Upvotes: 7
Views: 18419
Reputation: 1
Please try this – it worked for me:
#Read tsv file
df = spark.read\
.format("csv")\
.options(inferSchema = "True",header = "True", sep = "\t" )\
.load(path)
display(df)
Upvotes: 0
Reputation: 5526
Well you can directly read the tsv file without providing external schema if there is header available as:
df = spark.read.csv(path, sep=r'\t', header=True).select('col1','col2')
Since spark is lazily evaluated it'll read only selected columns. Hope it helps.
Upvotes: 12