Klun
Klun

Reputation: 54

Spark sortBy : is the order preserved when writing?

I have in Scala/Spark :

  myDataframe
   .orderBy("date")
   .write
   .csv(...)

The generated CSV are :

part-00000-xxx.csv
part-00001-xxx.csv
part-00002-xxx.csv

Questions :

  1. Do you know if after running the previous code, the "date" order will be guarantee to be preserved inside a single file ?

  2. It is also true between files ? I mean "date" in part-00001 are guarantee to be superior than thoses in part-00000 ?

  3. If not, could you please post a code that meet both requirements explained ?

Upvotes: 2

Views: 1717

Answers (1)

chlebek
chlebek

Reputation: 2451

If you will do .coalesce(1) before saving the order will remain.

You can add column with index of order, maybe it will help you.

myDataframe
  .withColumn("order", row_number().over(Window.orderBy('date)))
  .write
  .csv(...)

Upvotes: 1

Related Questions