Reputation: 589
I want to partition my results and save them as a CSV file into a specified location. However, I didn't find any option to specify the file format using the below code. All the files are created with the format part-000**. How can I specify the required file format here?
records.repartition(partitionNum).saveAsTextFile(path)
Upvotes: 0
Views: 630
Reputation: 174
you can try this
df.coalesce(1).write.option("header",true).csv(path)
this path it will be a folder, and it must not be exists, and you can't generate specify csv file. But you can change the hdfs file name by hadoop api(contains in spark).
import org.apache.hadoop.fs._
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val file = fs.globStatus(new Path(s"$path/part*"))(0).getPath().getName()
val result:Boolean = fs.rename(new Path(s"$path/$file"), new Path(s"$hdfsFolder/${fileName}"))
Upvotes: 1