reduceByKey is not a member

Question

Hi I have code simply get word counts from a document. I also need to use a map to look up the data value before generating the output. Here is the code.

   requests
    .filter(_.description.exists(_.length > 0))
    .flatMap { case request =>
      broadcastDataMap.value.get(request.requestId).map {
        data =>
          val text = Seq(
            data.name,
            data.taxonym,
            data.pluralTaxonym,
            request.description.get
          ).mkString(" ")
          getWordCountsInDocument(text).map { case (word, count) =>
            (word, Map(request.requestId -> count))
          }
      }
    }
    .reduceByKey(mergeMap)

The error message is

reduceByKey is not a member of org.apache.spark.rdd.RDD[scala.collection.immutable.Map[String,scala.collection.immutable.Map[Int,Int]]]

How can I resolve this? I do need to call getWordCountsInDocument. Thanks!

Angelo Genovese · Accepted Answer

reduceByKey is a member of PairRDDFunctions, basically it gets added implicitly to RDDs in the form RDD[(K, V)]. You probably need to flatten out the structure to be an RDD[String, Map[Int,Int]].

If you can provide the types for your inputs (requests, broadcastDataMap and mergeMap) we may be able to provide some help with that conversion.

From the types provided, and an assumption that the return type of getWordCountsInDocument is some Collection[(word, count: Int)]

Changing:

broadcastDataMap.value.get(request.requestId).map {

to

broadcastDataMap.value.get(request.requestId).flatMap {

should fix the issue.

reduceByKey is not a member

Answers (1)

Related Questions