Reputation: 227

How to Reduce by key in "Scala" [Not In Spark]

I am trying to reduceByKeys in Scala, is there any method to reduce the values based on the keys in Scala. [ i know we can do by reduceByKey method in spark, but how do we do the same in Scala ? ]

The input Data is :

val File = Source.fromFile("C:/Users/svk12/git/data/retail_db/order_items/part-00000")
                 .getLines()
                 .toList

 val map = File.map(x => x.split(","))
               .map(x => (x(1),x(4)))

  map.take(10).foreach(println)

After Above Step i am getting the result as:

(2,250.0)
(2,129.99)
(4,49.98)
(4,299.95)
(4,150.0)
(4,199.92)
(5,299.98)
(5,299.95)

Expected Result :

(2,379.99)
(5,499.93)
.......

Upvotes: 7

Answers (5)

Raphael Roth

Reputation: 27383

Here another solution using a foldLeft:

val File : List[String] = ???

File.map(x => x.split(","))
  .map(x => (x(1),x(4).toInt))
  .foldLeft(Map.empty[String,Int]){case (state, (key,value)) => state.updated(key,state.get(key).getOrElse(0)+value)}
  .toSeq
  .sortBy(_._1)
  .take(10)
  .foreach(println)

Upvotes: 0

Xavier Guihot

Reputation: 61766

Starting Scala 2.13, you can use the groupMapReduce method which is (as its name suggests) an equivalent of a groupBy followed by mapValues and a reduce step:

io.Source.fromFile("file.txt")
  .getLines.to(LazyList)
  .map(_.split(','))
  .groupMapReduce(_(1))(_(4).toDouble)(_ + _)

The groupMapReduce stage:

groups splited arrays by their 2nd element (_(1)) (group part of groupMapReduce)
maps each array occurrence within each group to its 4th element and cast it to Double (_(4).toDouble) (map part of groupMapReduce)
reduces values within each group (_ + _) by summing them (reduce part of groupMapReduce).

This is a one-pass version of what can be translated by:

seq.groupBy(_(1)).mapValues(_.map(_(4).toDouble).reduce(_ + _))

Also note the cast from Iterator to LazyList in order to use a collection which provides groupMapReduce (we don't use a Stream, since starting Scala 2.13, LazyList is the recommended replacement of Streams).

Upvotes: 5

jwvh

Reputation: 51271

It looks like you want the sum of some values from a file. One problem is that files are strings, so you have to cast the String to a number format before it can be summed.

These are the steps you might use.

io.Source.fromFile("so.txt") //open file
  .getLines()                //read line-by-line
  .map(_.split(","))         //each line is Array[String]
  .toSeq                     //to something that can groupBy()
  .groupBy(_(1))             //now is Map[String,Array[String]]
  .mapValues(_.map(_(4).toInt).sum) //now is Map[String,Int]
  .toSeq                     //un-Map it to (String,Int) tuples
  .sorted                    //presentation order
  .take(10)                  //sample
  .foreach(println)          //report

This will, of course, throw if any file data is not in the required format.

Upvotes: 3

Ajay Srivastava

Reputation: 1181

First group the tuple using key, first element here and then reduce. Following code will work -

val reducedList = map.groupBy(_._1).map(l => (l._1, l._2.map(_._2).reduce(_+_)))
print(reducedList)

Upvotes: 1

Vladimir Matveev

Reputation: 128111

There is nothing built-in, but you can write it like this:

def reduceByKey[A, B](items: Traversable[(A, B)])(f: (B, B) => B): Map[A, B] = {
  var result = Map.empty[A, B]
  items.foreach {
    case (a, b) =>
      result += (a -> result.get(a).map(b1 => f(b1, b)).getOrElse(b))
  }
  result
}

There is some space to optimize this (e.g. use mutable maps), but the general idea remains the same.

Another approach, more declarative but less efficient (creates several intermediate collections; can be rewritten but with loss of clarity:

def reduceByKey[A, B](items: Traversable[(A, B)])(f: (B, B) => B): Map[A, B] = {
  items
    .groupBy { case (a, _) => a }
    .mapValues(_.map { case (_, b) => b }.reduce(f))
    // mapValues returns a view, view.force changes it back to a realized map
    .view.force
}

Upvotes: 1

How to Reduce by key in &quot;Scala&quot; [Not In Spark]

Answers (5)

Related Questions

How to Reduce by key in "Scala" [Not In Spark]