sras
sras

Reputation: 838

what happens when reduceByKey in action?

My mapToPair function produces the output below.

(a, 1) (a, 1) (b, 1)

I'm reducing the values using reduceByKey function and the code follows:

  private static final Function2<Integer, Integer, Integer> WORDS_REDUCER =
      new Function2<Integer, Integer, Integer>() {
        public Integer call(Integer a, Integer b) throws Exception {
          return a + b;
        }
      };

It is working fine, can someone explains me how this code works when performing for the pair (b, 1) ?

Upvotes: 0

Views: 551

Answers (1)

foundry
foundry

Reputation: 31745

I am not quite clear from the question what it is that you don't understand, but perhaps this will help...

reduceByKey with the function x+y acts as an accumulator, summing the values per key. If you only have one value for a particular key, that value will be the summed result.

Here's an example using PySpark:

  testrdd = sc.parallelize((('a', 1), ('a', 1), ('b', 1)))
  testrdd = testrdd.reduceByKey(lambda x,y:x+y)
  result = testrdd.collect()
  print ("result: {}".format(result))

  >result: [('a', 2), ('b', 1)]

Upvotes: 1

Related Questions