Reputation: 3064
I have such code:
val relationships: RDD[String] = ....
relationships.saveAsTextFile("/tmp/result")
I expected that 'saveAsTextFile' method saves 'relationships' to file '/tmp/result'. But it created folder '/tmp/result/' and put there a lot of text files with names part-00000, part-00001, etc. Is it normal behavior of this method?
Upvotes: 2
Views: 618
Reputation: 72
That's not strange, As Tim said that's exepected behaviou because the computation is distributed.
I just want to add that if your RDD is too big, running coalesce might not be a great idea
Upvotes: 0
Reputation: 3354
The reason is that it saves it as multiple files because the computation is distributed.
If you require the file to be saved with saveAsTextFile you can use
coalesce(1,true).saveAsTextFile();
.
This basically means to do the computation and then coalesce it into 1 partition. You can also use repartition(1)
, which is just a wrapper for coalesce()
with the suffle argument set to true.
As an alternative, if your data fit in one partition you can collect your rdd and then save the array.
Upvotes: 5