RandomBookmark
RandomBookmark

Reputation: 67

reduceByKey is not a member

Hi I have code simply get word counts from a document. I also need to use a map to look up the data value before generating the output. Here is the code.

   requests
    .filter(_.description.exists(_.length > 0))
    .flatMap { case request =>
      broadcastDataMap.value.get(request.requestId).map {
        data =>
          val text = Seq(
            data.name,
            data.taxonym,
            data.pluralTaxonym,
            request.description.get
          ).mkString(" ")
          getWordCountsInDocument(text).map { case (word, count) =>
            (word, Map(request.requestId -> count))
          }
      }
    }
    .reduceByKey(mergeMap)

The error message is

reduceByKey is not a member of org.apache.spark.rdd.RDD[scala.collection.immutable.Map[String,scala.collection.immutable.Map[Int,Int]]]

How can I resolve this? I do need to call getWordCountsInDocument. Thanks!

Upvotes: 4

Views: 5889

Answers (1)

Angelo Genovese
Angelo Genovese

Reputation: 3398

reduceByKey is a member of PairRDDFunctions, basically it gets added implicitly to RDDs in the form RDD[(K, V)]. You probably need to flatten out the structure to be an RDD[String, Map[Int,Int]].

If you can provide the types for your inputs (requests, broadcastDataMap and mergeMap) we may be able to provide some help with that conversion.

From the types provided, and an assumption that the return type of getWordCountsInDocument is some Collection[(word, count: Int)]

Changing:

broadcastDataMap.value.get(request.requestId).map {

to

broadcastDataMap.value.get(request.requestId).flatMap {

should fix the issue.

Upvotes: 3

Related Questions