Rohan Nayak
Rohan Nayak

Reputation: 233

What is the meaning for reduceByKey(_ ++ _)

Recently I had scenario to store the the data in keyValue Pair and came across a function reduceByKey(_ ++ _) . This is more of shorthand syntax. I am not able to understand what this actually means.

Ex: reduceBykey(_ + _) means reduceByKey((a,b)=>(a+b))

So reduceByKey(_ ++ _) means ??

I am able to create Key value pair out of data using reduceByKey(_ ++ _).

val y = sc.textFile("file:///root/My_Spark_learning/reduced.txt")

y.map(value=>value.split(","))
  .map(value=>(value(0),value(1),value(2)))
  .collect
  .foreach(println)

(1,2,3)
(1,3,4)
(4,5,6)
(7,8,9)

y.map(value=>value.split(","))
  .map(value=>(value(0),Seq(value(1),value(2))))
  .reduceByKey(_ ++ _)
  .collect
  .foreach(println)

(1,List(2, 3, 3, 4))
(4,List(5, 6))
(7,List(8, 9))

Upvotes: 6

Views: 4943

Answers (2)

rogue-one
rogue-one

Reputation: 11587

reduceByKey(_ ++ _) translates to reduceByKey((a,b) => a ++ b).

++ is a method defined on List that concatenates another list to it.

So, for key 1 in the sample data, a will be List(2,3) and b will be List(3,4) and hence the concatenation of List(2,3) and List(3,4) (List(2,3) ++ List(3,4)) would yield List(2,3,3,4).

Upvotes: 8

koiralo
koiralo

Reputation: 23119

reduceByKey(_ ++ _) is equivalent to reduceByKey((x,y)=> x ++ y) reduceByKey takes two parameters, apply a function and returns

At the first it crates a set and ++ just adds collections together, combining elements of both sets.

For each key It keeps appending in the list. In your case of 1 as a key x will be List(2,3) and y will List (3,4) and ++ will add both as List (2,3,3,4)

If you had another value like (1,4,5) then the x would be List(4,5) in this case and y should be List (2,3,3,4) and result would be List(2,3,3,4,4,5)

Upvotes: 1

Related Questions