Hlib
Hlib

Reputation: 3064

Strange behavior of 'saveAsTextFile' method

I have such code:

val relationships: RDD[String] = ....
relationships.saveAsTextFile("/tmp/result")

I expected that 'saveAsTextFile' method saves 'relationships' to file '/tmp/result'. But it created folder '/tmp/result/' and put there a lot of text files with names part-00000, part-00001, etc. Is it normal behavior of this method?

Upvotes: 2

Views: 618

Answers (2)

ZeoS
ZeoS

Reputation: 72

That's not strange, As Tim said that's exepected behaviou because the computation is distributed.

I just want to add that if your RDD is too big, running coalesce might not be a great idea

Upvotes: 0

gasparms
gasparms

Reputation: 3354

The reason is that it saves it as multiple files because the computation is distributed.

If you require the file to be saved with saveAsTextFile you can use coalesce(1,true).saveAsTextFile();.

This basically means to do the computation and then coalesce it into 1 partition. You can also use repartition(1), which is just a wrapper for coalesce() with the suffle argument set to true.

As an alternative, if your data fit in one partition you can collect your rdd and then save the array.

Upvotes: 5

Related Questions