Reputation: 233
Recently I had scenario to store the the data in keyValue Pair and came across a function reduceByKey(_ ++ _)
. This is more of shorthand syntax. I am not able to understand what this actually means.
Ex: reduceBykey(_ + _)
means reduceByKey((a,b)=>(a+b))
So reduceByKey(_ ++ _)
means ??
I am able to create Key value pair out of data using reduceByKey(_ ++ _)
.
val y = sc.textFile("file:///root/My_Spark_learning/reduced.txt")
y.map(value=>value.split(","))
.map(value=>(value(0),value(1),value(2)))
.collect
.foreach(println)
(1,2,3)
(1,3,4)
(4,5,6)
(7,8,9)
y.map(value=>value.split(","))
.map(value=>(value(0),Seq(value(1),value(2))))
.reduceByKey(_ ++ _)
.collect
.foreach(println)
(1,List(2, 3, 3, 4))
(4,List(5, 6))
(7,List(8, 9))
Upvotes: 6
Views: 4943
Reputation: 11587
reduceByKey(_ ++ _)
translates to reduceByKey((a,b) => a ++ b)
.
++
is a method defined on List
that concatenates another list to it.
So, for key 1 in the sample data, a
will be List(2,3)
and b
will be List(3,4)
and hence the concatenation of List(2,3)
and List(3,4)
(List(2,3) ++ List(3,4)
) would yield List(2,3,3,4)
.
Upvotes: 8
Reputation: 23119
reduceByKey(_ ++ _)
is equivalent to reduceByKey((x,y)=> x ++ y)
reduceByKey
takes two parameters, apply a function and returns
At the first it crates a set and ++
just adds collections together, combining elements of both sets.
For each key It keeps appending in the list. In your case of 1 as a key x will be List(2,3)
and y will List (3,4)
and ++
will add both as List (2,3,3,4)
If you had another value like (1,4,5)
then the x would be List(4,5)
in this case and y should be List (2,3,3,4)
and result would be List(2,3,3,4,4,5)
Upvotes: 1