Reputation: 67
Hi I have code simply get word counts from a document. I also need to use a map to look up the data value before generating the output. Here is the code.
requests
.filter(_.description.exists(_.length > 0))
.flatMap { case request =>
broadcastDataMap.value.get(request.requestId).map {
data =>
val text = Seq(
data.name,
data.taxonym,
data.pluralTaxonym,
request.description.get
).mkString(" ")
getWordCountsInDocument(text).map { case (word, count) =>
(word, Map(request.requestId -> count))
}
}
}
.reduceByKey(mergeMap)
The error message is
reduceByKey is not a member of org.apache.spark.rdd.RDD[scala.collection.immutable.Map[String,scala.collection.immutable.Map[Int,Int]]]
How can I resolve this? I do need to call getWordCountsInDocument. Thanks!
Upvotes: 4
Views: 5889
Reputation: 3398
reduceByKey is a member of PairRDDFunctions, basically it gets added implicitly to RDDs in the form RDD[(K, V)]
. You probably need to flatten out the structure to be an RDD[String, Map[Int,Int]]
.
If you can provide the types for your inputs (requests
, broadcastDataMap
and mergeMap
) we may be able to provide some help with that conversion.
From the types provided, and an assumption that the return type of getWordCountsInDocument is some Collection[(word, count: Int)]
Changing:
broadcastDataMap.value.get(request.requestId).map {
to
broadcastDataMap.value.get(request.requestId).flatMap {
should fix the issue.
Upvotes: 3