Reputation: 1
I'm trying to export a Dataframe to a CSV file using .NET SPARK, but my export file has the default name 'part-00000-{GUID}', what i wanted was to manipulate the file's name according to my business rules, ex:'ABC_20200504.csv'.
This is my code:
string pathSource = Path.Combine(path, folderName);
exportDataFrame
.Coalesce(1)
.Write()
.Option("header", "false")
.Mode(SaveMode.Append)
.Csv(pathSource);
I tried to manipulate the pathSource, forcing to export into a 'test.csv', but using this approach, I always get a directory with that name and the file will be inside the folder 'test.csv'.
I really need some solution for this, if someone could help, i would be very thankfull.
Upvotes: 0
Views: 175
Reputation: 1244
Try this code:
exportDataFrame
.Repartition(1)
.Write()
.Mode("overwrite")
.Format("com.databricks.spark.csv")
.Option("header", "true")
.Save("ABC_20200504.csv");
It has to create a single file output as \ABC_20200504.csv\part-00000
Then you can rename the file part-0000 in the way like in this example:
System.IO.File.Move("D:\\part-00000.txt", "D:\\ABC_20200504.txt");
The original solution was written in Scala
, taken from the link below and edited for C#
: https://www.dataneb.com/post/how-to-write-single-csv-file-using-spark
The link describes 5 methods how to write to a single CSV
-file.
Upvotes: 1