Poisson
Poisson

Reputation: 1623

save a csv file into s3 bucket from pypark dataframe

I would like to save the content of a spark dataframe into a csv file in s3 bucket:

df_country.repartition(1).write.csv('s3n://bucket/test/csv/a',sep=",",header=True,mode='overwrite')

the problem that it creaate a file with a name : part-00000-fc644e84-7579-48.

Is there any way to fix the name of this file. For example test.csv?

Thanks

Best

Upvotes: 0

Views: 3859

Answers (1)

Ryan
Ryan

Reputation: 299

This is not possible since every partition in the job will create its own file and must follow a strict convention to avoid naming conflicts. The recommended solution is to rename the file after it is created.

Also, if you know you are only writing one file per path. Ex. s3n://bucket/test/csv/a. Then it doesn't really matter what the name of the file is, simply read in all the contents of that unique directory name.

Sources: 1. Specifying the filename when saving a DataFrame as a CSV 2. Spark dataframe save in single file on hdfs location

Upvotes: 1

Related Questions