Reputation: 227
I am trying to reduceByKeys in Scala, is there any method to reduce the values based on the keys in Scala. [ i know we can do by reduceByKey method in spark, but how do we do the same in Scala ? ]
The input Data is :
val File = Source.fromFile("C:/Users/svk12/git/data/retail_db/order_items/part-00000")
.getLines()
.toList
val map = File.map(x => x.split(","))
.map(x => (x(1),x(4)))
map.take(10).foreach(println)
After Above Step i am getting the result as:
(2,250.0)
(2,129.99)
(4,49.98)
(4,299.95)
(4,150.0)
(4,199.92)
(5,299.98)
(5,299.95)
Expected Result :
(2,379.99)
(5,499.93)
.......
Upvotes: 7
Views: 2015
Reputation: 27373
Here another solution using a foldLeft:
val File : List[String] = ???
File.map(x => x.split(","))
.map(x => (x(1),x(4).toInt))
.foldLeft(Map.empty[String,Int]){case (state, (key,value)) => state.updated(key,state.get(key).getOrElse(0)+value)}
.toSeq
.sortBy(_._1)
.take(10)
.foreach(println)
Upvotes: 0
Reputation: 61666
Starting Scala 2.13
, you can use the groupMapReduce
method which is (as its name suggests) an equivalent of a groupBy
followed by mapValues
and a reduce
step:
io.Source.fromFile("file.txt")
.getLines.to(LazyList)
.map(_.split(','))
.groupMapReduce(_(1))(_(4).toDouble)(_ + _)
The groupMapReduce
stage:
group
s splited arrays by their 2nd element (_(1)
) (group part of groupMapReduce)
map
s each array occurrence within each group to its 4th element and cast it to Double
(_(4).toDouble
) (map part of groupMapReduce)
reduce
s values within each group (_ + _
) by summing them (reduce part of groupMapReduce).
This is a one-pass version of what can be translated by:
seq.groupBy(_(1)).mapValues(_.map(_(4).toDouble).reduce(_ + _))
Also note the cast from Iterator
to LazyList
in order to use a collection which provides groupMapReduce
(we don't use a Stream
, since starting Scala 2.13
, LazyList
is the recommended replacement of Stream
s).
Upvotes: 5
Reputation: 51271
It looks like you want the sum of some values from a file. One problem is that files are strings, so you have to cast the String
to a number format before it can be summed.
These are the steps you might use.
io.Source.fromFile("so.txt") //open file
.getLines() //read line-by-line
.map(_.split(",")) //each line is Array[String]
.toSeq //to something that can groupBy()
.groupBy(_(1)) //now is Map[String,Array[String]]
.mapValues(_.map(_(4).toInt).sum) //now is Map[String,Int]
.toSeq //un-Map it to (String,Int) tuples
.sorted //presentation order
.take(10) //sample
.foreach(println) //report
This will, of course, throw if any file data is not in the required format.
Upvotes: 3
Reputation: 1181
First group the tuple using key, first element here and then reduce. Following code will work -
val reducedList = map.groupBy(_._1).map(l => (l._1, l._2.map(_._2).reduce(_+_)))
print(reducedList)
Upvotes: 1
Reputation: 127761
There is nothing built-in, but you can write it like this:
def reduceByKey[A, B](items: Traversable[(A, B)])(f: (B, B) => B): Map[A, B] = {
var result = Map.empty[A, B]
items.foreach {
case (a, b) =>
result += (a -> result.get(a).map(b1 => f(b1, b)).getOrElse(b))
}
result
}
There is some space to optimize this (e.g. use mutable maps), but the general idea remains the same.
Another approach, more declarative but less efficient (creates several intermediate collections; can be rewritten but with loss of clarity:
def reduceByKey[A, B](items: Traversable[(A, B)])(f: (B, B) => B): Map[A, B] = {
items
.groupBy { case (a, _) => a }
.mapValues(_.map { case (_, b) => b }.reduce(f))
// mapValues returns a view, view.force changes it back to a realized map
.view.force
}
Upvotes: 1