Durga Viswanath Gadiraju
Durga Viswanath Gadiraju

Reputation: 3956

Saving files in Spark

There are two operations on RDD to save. One is saveAsTextFile and other is saveAsObjectFile. I understand saveAsTextFile, but not saveAsObjectFile. I am new to Spark and scala and hence I am curious about saveAsObjectFile.

1) Is it sequence file from Hadoop or some thing different?

2) Can I read those files which are generated using saveAsObjectFile using Map Reduce? If yes, how?

Upvotes: 1

Views: 5464

Answers (2)

Sumit
Sumit

Reputation: 1420

  1. saveAsTextFile() - Persist the RDD as a compressed text file, using string representations of elements. It leverages Hadoop's TextOutputFormat. In order to provide compression we can use the overloaded method which accepts the second argument as CompressionCodec. Refer to RDD API
  2. saveAsObjectFile() - Persist the Object of RDD as a SequenceFile of serialized objects.

Now while reading the Sequence files you can use SparkContext.objectFile("Path of File") which Internally leverage Hadoop's SequenceFileInputFormat to read the files.

Alternatively you can also use SparkContext.newAPIHadoopFile(...) which accepts Hadoop's InputFormat and path as parameter.

Upvotes: 3

Atul Soman
Atul Soman

Reputation: 4720

rdd.saveAsObjectFile saves RDD as a sequence file. To read those files use sparkContext.objectFile("fileName")

Upvotes: 1

Related Questions