Reputation: 1623
I would like to save the content of a spark dataframe into a csv file in s3 bucket:
df_country.repartition(1).write.csv('s3n://bucket/test/csv/a',sep=",",header=True,mode='overwrite')
the problem that it creaate a file with a name : part-00000-fc644e84-7579-48.
Is there any way to fix the name of this file. For example test.csv?
Thanks
Best
Upvotes: 0
Views: 3859
Reputation: 299
This is not possible since every partition in the job will create its own file and must follow a strict convention to avoid naming conflicts. The recommended solution is to rename the file after it is created.
Also, if you know you are only writing one file per path.
Ex. s3n://bucket/test/csv/a
. Then it doesn't really matter what the name of the file is, simply read in all the contents of that unique directory name.
Sources: 1. Specifying the filename when saving a DataFrame as a CSV 2. Spark dataframe save in single file on hdfs location
Upvotes: 1