Choix
Choix

Reputation: 575

Scala not able to save as sequence file in RDD, as per doc it is allowed

I am using Spark 1.6, as per the official doc it is allowed to save a RDD to sequence file format, however I notice for my RDD textFile:

scala> textFile.saveAsSequenceFile("products_sequence")
<console>:30: error: value saveAsSequenceFile is not a member of org.apache.spark.rdd.RDD[String]

I googled and found similar discussions seem to suggest this works in pyspark. Is my understanding to the official doc wrong? Can saveAsSequenceFile() be used in Scala?

Upvotes: 0

Views: 1821

Answers (1)

Knows Not Much
Knows Not Much

Reputation: 31546

The saveAsSequenceFile is only available when you have key value pairs in the RDD. The reason for this is that it is defined in PairRDDFunctions

https://spark.apache.org/docs/2.1.1/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

You can see that the API definition takes a K and a V.

if you change your code above to

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.rdd._

object SequeneFile extends App {
   val conf = new SparkConf().setAppName("sequenceFile").setMaster("local[1]")
   val sc = new SparkContext(conf)
   val rdd : RDD[(String, String)] = sc.parallelize(List(("foo", "foo1"), ("bar", "bar1"), ("baz", "baz1")))
   rdd.saveAsSequenceFile("foo.seq")
   sc.stop()
}

This works perfectly and you will get foo.seq file. The reason why the above works is because we have an RDD which is a key value pair and not just a RDD[String].

Upvotes: 1

Related Questions