Saving files in Spark

Question

There are two operations on RDD to save. One is saveAsTextFile and other is saveAsObjectFile. I understand saveAsTextFile, but not saveAsObjectFile. I am new to Spark and scala and hence I am curious about saveAsObjectFile.

1) Is it sequence file from Hadoop or some thing different?

2) Can I read those files which are generated using saveAsObjectFile using Map Reduce? If yes, how?

Sumit · Accepted Answer

saveAsTextFile() - Persist the RDD as a compressed text file, using string representations of elements. It leverages Hadoop's TextOutputFormat. In order to provide compression we can use the overloaded method which accepts the second argument as CompressionCodec. Refer to RDD API
saveAsObjectFile() - Persist the Object of RDD as a SequenceFile of serialized objects.

Now while reading the Sequence files you can use SparkContext.objectFile("Path of File") which Internally leverage Hadoop's SequenceFileInputFormat to read the files.

Alternatively you can also use SparkContext.newAPIHadoopFile(...) which accepts Hadoop's InputFormat and path as parameter.

Saving files in Spark

Answers (2)

Related Questions