poliu2s
poliu2s

Reputation: 657

Apache Spark: Hashmap accumulators give type mismatch error

I am trying to declare a hashmap as Spark (v0.9.1) accumulator. The docs state that "Spark natively supports accumulators of numeric value types and standard mutable collections..." (link).

However, this doesn't seem to be working for me when I try to create a HashMap[String, Boolean]:

scala> import collection.mutable.HashMap
import collection.mutable.HashMap

scala> val accum = sc.accumulator("test" -> true)(HashMap)
<console>:13: error: type mismatch;
 found   : scala.collection.mutable.HashMap.type
 required: org.apache.spark.AccumulatorParam[(String, Boolean)]
           val accum = sc.accumulator("test" -> true)(HashMap)

Upvotes: 2

Views: 6473

Answers (2)

tkroman
tkroman

Reputation: 4808

First of all, you should pass an actual HashMap[String, String] rather than (String, String):

sc.accumulator(HashMap("t" -> true))

And you may need to write your own accumulator, 'cause I didn't find an out-of-the-box implicit for the HashMap:

implicit object iHashMap extends AccumulatorParam[HashMap[String, Boolean]] {
  def zero(m: HashMap[String, String]) = HashMap()
  def addInPlace(m1: HashMap[String, Boolean], m2: HashMap[String, Boolean]) = m1 ++ m2 
}

The behaviour is probably not what you personally would expect, but I hope you catch the gist.

You may also find some really useful examples here: https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/test/scala/org/apache/spark/AccumulatorSuite.scala

Upvotes: 7

fedragon
fedragon

Reputation: 884

I remember having the same issue, here's a small gist to use a HashMap[String, Int] as accumulator in Spark: HashMapParam.scala

If a key k already exists in the accumulator with a value v1 and we try to put k -> v2 in it, the resulting accumulator will contain k -> v1 + v2.

This does not completely answer your question but could be helpful to build your own implementation.

Upvotes: 7

Related Questions