Is there a way to improve the performance of saveAsTextFile method on spark

Question

I've processed large data in spark and stored them in HDFS.

However, I feel that the saveAsTextFile method is somewhat slower.

So I wonder if there is a way to improve its performance.

My original code (which is running slower than expected)

val data = sc.textFile("data", 200); 
data.
  flatMap(_.split(" ")).
  map(word => (word, 1)).
  reduceByKey(_ + _).
  saveAsTextFile("output")

When I add coalesce(1), the speed improves dramatically

val data = sc.textFile("data", 200); 
data.
   flatMap(_.split(" ")).
   map(word => (word, 1)).
   reduceByKey(_ + _).
   coalesce(1).
   saveAsTextFile("output")

Is there a way to improve the performance of saveAsTextFile method on spark

Answers (1)

Related Questions