user2221654
user2221654

Reputation: 321

How to write a spark dataframe tab delimited as a text file using java

I have a Spark Dataset<Row> with lot of columns that have to be written to a text file with a tab delimiter. With csv its easy to specify that option, but how to handle this for a text file when using Java?

Upvotes: 0

Views: 6073

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29155

Option 1 :

    yourDf
    .coalesce(1) // if you want to save as single file
    .write
    .option("sep", "\t")
    .option("encoding", "UTF-8")
    .csv("outputpath")

same as writing csv but here tab delimeter you need to use.

Yes its csv as you mentioned in the comment, if you want to rename the file you can do the below..


import org.apache.hadoop.fs.FileSystem;
FileSystem fs = FileSystem.get(spark.sparkContext.hadoopConfiguration);
fs.rename(new Path("outputpath"), new Path(outputpath.txt))

Note : 1) you can use fs.globStatus if you have multiple file under your outputpath inthis case coalesce(1) will make single csv, hence not needed. 2) if you are using s3 instead of hdfs you may need to set below before attempting to rename...

spark.sparkContext.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")

Option 2 :

Other option (if you don't want use csv api) could be like below

 yourDf.rdd
.coalesce(1)
.map(x => x.mkString("\t"))
.saveAsTextFile("yourfile.txt")

Upvotes: 3

Related Questions