Reputation: 31
Want the output like this: (167, 5). I want the sum of the ages and count
val people = List(("Ganga",43,'F'),("John",28,'M'),("Lolitha",33,'F'),("Don't Know",18,'T'))
val peopleRDD = sc.parallelize(people)
val kv = peopleRDD.map(x=>((x._1,x._3),x._2))
val result = kv.aggregateByKey(0,0)((x : (Int,Int) , y : Int)=> (x._1+y,x._2+1),(x:(Int,Int),y:(Int,Int))=>(x._1+y._1,x._2+y._2))
getting an error on the result as :
console>:33: error: type mismatch; found : (Int, Int) required: Int
val result = kv.aggregateByKey(0,0)((x : (Int,Int) , y : Int)=> (x._1+y,x._2+1),(x:(Int,Int),y:(Int,Int))=>(x._1+y._1,x._2+y._2))
Upvotes: 1
Views: 210
Reputation: 16076
It's because you need additional ():
val result = kv.aggregateByKey((0,0))((x : (Int,Int) , y : Int)=> (x._1+y,x._2+1),(x:(Int,Int),y:(Int,Int))=>(x._1+y._1,x._2+y._2))
Scala REPL gets confused about parameters. You can reproduce it with simple function:
def func[T](arg: T)(other : (T, Int) => T) = arg
When you use:
func(0,0)((a : Int, b : Int) => a)
You will get the same error. What's interesting, if the second argument of other
is for example String, everything works fine.
To avoid this compiler confusion, you should just one additional () to tell compiler that you want a tuple
Upvotes: 1