Nav
Nav

Reputation: 31

spark aggregateByKey- giving a type mismatch error

Want the output like this: (167, 5). I want the sum of the ages and count

val people = List(("Ganga",43,'F'),("John",28,'M'),("Lolitha",33,'F'),("Don't Know",18,'T'))
val peopleRDD = sc.parallelize(people)
val kv = peopleRDD.map(x=>((x._1,x._3),x._2))
val result = kv.aggregateByKey(0,0)((x : (Int,Int) , y : Int)=> (x._1+y,x._2+1),(x:(Int,Int),y:(Int,Int))=>(x._1+y._1,x._2+y._2))

getting an error on the result as :

console>:33: error: type mismatch; found : (Int, Int) required: Int

val result = kv.aggregateByKey(0,0)((x : (Int,Int) , y : Int)=> (x._1+y,x._2+1),(x:(Int,Int),y:(Int,Int))=>(x._1+y._1,x._2+y._2))

Upvotes: 1

Views: 210

Answers (1)

T. Gawęda
T. Gawęda

Reputation: 16076

It's because you need additional ():

val result = kv.aggregateByKey((0,0))((x : (Int,Int) , y : Int)=> (x._1+y,x._2+1),(x:(Int,Int),y:(Int,Int))=>(x._1+y._1,x._2+y._2))

Scala REPL gets confused about parameters. You can reproduce it with simple function:

def func[T](arg: T)(other : (T, Int) => T) = arg

When you use:

func(0,0)((a : Int, b : Int) => a)

You will get the same error. What's interesting, if the second argument of other is for example String, everything works fine.

To avoid this compiler confusion, you should just one additional () to tell compiler that you want a tuple

Upvotes: 1

Related Questions