Kevin Tianyu Xu
Kevin Tianyu Xu

Reputation: 734

Write a DataFrame to csv file with a custom row/line delimiter/separator

I need to produce a delimited file where each row it separated by a '^' and columns are delimited by '|'.

There don't seem to be options to change the row delimiter for csv output type.

eg:

df.coalesce(1).write\
.format("com.databricks.spark.csv")\
.mode("overwrite")\
.option("header", "true")\
.option("sep","|")\
# no options for setting lineSep to '^' 
.save(destination_path)

Upvotes: 1

Views: 5548

Answers (2)

Kevin Tianyu Xu
Kevin Tianyu Xu

Reputation: 734

In pyspark version 3+ there is an option to set line separator:

df.coalesce(1).write\
.format("com.databricks.spark.csv")\
.mode("overwrite")\
.option("header", "true")\
.option("sep","|")\
.option("lineSep","^")\
.save(destination_path)

Upvotes: 0

MahzadK
MahzadK

Reputation: 26

One solution consists of to convert the DataFrame to rdd :

df.rdd.map(x=>x.mkString("^")).saveAsTextFile("OutCSV")

Upvotes: 1

Related Questions