João Sousa
João Sousa

Reputation: 1

Is there a way to change the export filename using .NET SPARK?

I'm trying to export a Dataframe to a CSV file using .NET SPARK, but my export file has the default name 'part-00000-{GUID}', what i wanted was to manipulate the file's name according to my business rules, ex:'ABC_20200504.csv'.

This is my code:

string pathSource = Path.Combine(path, folderName);

exportDataFrame
                .Coalesce(1)
                .Write()
                .Option("header", "false")
                .Mode(SaveMode.Append)
                .Csv(pathSource);

I tried to manipulate the pathSource, forcing to export into a 'test.csv', but using this approach, I always get a directory with that name and the file will be inside the folder 'test.csv'.

I really need some solution for this, if someone could help, i would be very thankfull.

Upvotes: 0

Views: 175

Answers (1)

V. S.
V. S.

Reputation: 1244

Try this code:

exportDataFrame
    .Repartition(1)
    .Write()
    .Mode("overwrite")
    .Format("com.databricks.spark.csv")
    .Option("header", "true")
    .Save("ABC_20200504.csv");

It has to create a single file output as \ABC_20200504.csv\part-00000

Then you can rename the file part-0000 in the way like in this example:

System.IO.File.Move("D:\\part-00000.txt", "D:\\ABC_20200504.txt");  

The original solution was written in Scala, taken from the link below and edited for C#: https://www.dataneb.com/post/how-to-write-single-csv-file-using-spark The link describes 5 methods how to write to a single CSV-file.

Upvotes: 1

Related Questions