saveAsTextFile performance improvement

Question

I have used the datasource with the following format upto 1500000

I have use the following code snippet

JavaRDD dataCollection=ctx.textFile("hdfs://yarncluster/Input/datasource");

JavaPairRDD rdd=dataCollection.cartesian(dataCollection);

rdd.saveAsTextFile("hdfs://yarncluster/Ouput");

It take more time to save the data in cluster. Is there any other way to improve the performance?

saveAsTextFile performance improvement

Answers (1)

Related Questions