Reputation: 54
I have in Scala/Spark :
myDataframe
.orderBy("date")
.write
.csv(...)
The generated CSV are :
part-00000-xxx.csv
part-00001-xxx.csv
part-00002-xxx.csv
Questions :
Do you know if after running the previous code, the "date" order will be guarantee to be preserved inside a single file ?
It is also true between files ? I mean "date" in part-00001 are guarantee to be superior than thoses in part-00000 ?
If not, could you please post a code that meet both requirements explained ?
Upvotes: 2
Views: 1717
Reputation: 2451
If you will do .coalesce(1)
before saving the order will remain.
You can add column with index of order, maybe it will help you.
myDataframe
.withColumn("order", row_number().over(Window.orderBy('date)))
.write
.csv(...)
Upvotes: 1