Reputation: 575
I am using Spark 1.6, as per the official doc it is allowed to save a RDD to sequence file format, however I notice for my RDD textFile:
scala> textFile.saveAsSequenceFile("products_sequence")
<console>:30: error: value saveAsSequenceFile is not a member of org.apache.spark.rdd.RDD[String]
I googled and found similar discussions seem to suggest this works in pyspark. Is my understanding to the official doc wrong? Can saveAsSequenceFile()
be used in Scala?
Upvotes: 0
Views: 1821
Reputation: 31546
The saveAsSequenceFile is only available when you have key value pairs in the RDD. The reason for this is that it is defined in PairRDDFunctions
https://spark.apache.org/docs/2.1.1/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
You can see that the API definition takes a K and a V.
if you change your code above to
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.rdd._
object SequeneFile extends App {
val conf = new SparkConf().setAppName("sequenceFile").setMaster("local[1]")
val sc = new SparkContext(conf)
val rdd : RDD[(String, String)] = sc.parallelize(List(("foo", "foo1"), ("bar", "bar1"), ("baz", "baz1")))
rdd.saveAsSequenceFile("foo.seq")
sc.stop()
}
This works perfectly and you will get foo.seq file. The reason why the above works is because we have an RDD which is a key value pair and not just a RDD[String].
Upvotes: 1