ahoffer
ahoffer

Reputation: 6546

sortByKey in Spark

New to Spark and Scala. Trying to sort a word counting example. My code is based on this simple example. I want to sort the results alphabetically by key. If I add the key sort to an RDD:

 val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()

then I get a compile error:

error: No implicit view available from java.io.Serializable => Ordered[java.io.Serializable].
[INFO]     val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()

I don't know what the lack of an implicit view means. Can someone tell me how to fix it? I am running the Cloudera 5 Quickstart VM. I think it bundles Spark version 0.9.

Source of the Scala job

import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SparkWordCount {
  def main(args: Array[String]) {
    val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))

    val files = sc.textFile(args(0)).map(_.split(","))

    def f(x:Array[String]) = {
      if (x.length > 3)
        x(3)
      else
        Array("NO NAME")
   }

    val names = files.map(f)

    val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()

    System.out.println(wordCounts.collect().mkString("\n"))
  }
}

Some (unsorted) output

("INTERNATIONAL EYELETS INC",879)
("SHAQUITA SALLEY",865)
("PAZ DURIGA",791)
("TERESSA ALCARAZ",824)
("MING CHAIX",878)
("JACKSON SHIELDS YEISER",837)
("AUDRY HULLINGER",875)
("GABRIELLE MOLANDS",802)
("TAM TACKER",775)
("HYACINTH VITELA",837)

Upvotes: 4

Views: 9326

Answers (1)

aaronman
aaronman

Reputation: 18761

No implicit view means there is no scala function like this defined

implicit def SerializableToOrdered(x :java.io.Serializable) = new Ordered[java.io.Serializable](x) //note this function doesn't work

The reason this error is coming out is because in your function you are returning two different types with a super type of java.io.Serializable (ones a String the other an Array[String]). Also reduceByKey for obvious reasons requires the key to be an Orderable. Fix it like this

object SparkWordCount {
  def main(args: Array[String]) {
    val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))

    val files = sc.textFile(args(0)).map(_.split(","))

    def f(x:Array[String]) = {
      if (x.length > 3)
        x(3)
      else
        "NO NAME"
    }

    val names = files.map(f)

    val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()

    System.out.println(wordCounts.collect().mkString("\n"))
  }
}

Now the function just returns Strings instead of two different types

Upvotes: 3

Related Questions