Mihir
Mihir

Reputation: 603

GroupBy multiple columns as key and sum multiple columns like sql?

I am using scala 2.12.

I have a case class as follows:

case class MyClass(date: java.util.Date, book: String, priceLocal: Double, priceConv: Double)

I am able to group by based on date and book.

For instance, for:

val listOfMyClass = List(
  MyClass(20190708, "book1", 100, 120),
  MyClass(20190708, "book1", 200, 220),
  MyClass(20190708, "book2", 50, 60),
  MyClass(20190708, "book2", 60, 70)
)

val groupedData = listOfMyClass.groupBy(t => (t.date, t.book))

I want the data as like in SQL:

(20190708, "book1", 300, 340)
(20190708, "book2", 110, 130)

I am able to map and sum one column but not able to use both the columns.

val groupedDataSum = listOfMyClass.groupBy(t => (t.date, t.book)).mapValues(_.map(_.priceLocal).sum)

But how to use second column also as sum?

Upvotes: 2

Views: 621

Answers (4)

Kevin Lawrence
Kevin Lawrence

Reputation: 62

To get the SQL-like output you said you needed, you just need to do a last map on the Map[(Date,String],(Double, Double)] generated from the mapValues and reduce operations.

listOfMyClass groupBy(a => (a.date, a.book)) 
mapValues(a => a.map(e => (e.priceConv, e.priceLocal)) reduce((a,b) => (a._1+b._1, a._2+b._2)))
map (x => (x._1._1, x._1._2, x._2._1, x._2._1)) //final map will give you the SQL-type output you were looking for

Upvotes: 0

Jegan
Jegan

Reputation: 1751

mapValues followed by reduce should do the trick. Here is a sample code.

  val grouped = listOfMyClass.groupBy(t => (t.date, t.book))
    .mapValues(lst => lst.reduce((m1, m2) => 
      MyClass(m1.date, m1.book, m1.priceLocal + m2.priceLocal, m1.priceConv + m2.priceConv))).values

This gives back an iterator to the reduced List of MyClass instances.

Upvotes: 0

Leo C
Leo C

Reputation: 22449

You could make priceLocal and priceConv a Tuple, followed by a element-wise reduce to sum the individual Tuple elements:

listOfMyClass.groupBy(t => (t.date, t.book)).mapValues(
  _.map(s => (s.priceLocal, s.priceConv)).
    reduce((acc, x) => (acc._1 + x._1, acc._2 + x._2))
)

Upvotes: 1

Xavier Guihot
Xavier Guihot

Reputation: 61666

You could use a mix of groupBy (groups elements by date and book), and reduce to accumulate the grouped values:

// val list = List(
//   MyClass(Date(2019, 7, 8), "book1", 100, 120),
//   MyClass(Date(2019, 7, 8), "book1", 200, 220),
//   MyClass(Date(2019, 7, 8), "book2", 50, 60),
//   MyClass(Date(2019, 7, 8), "book2", 60, 70)
// )
list
  .groupBy { case MyClass(date, book, _, _) => (date, book) }
  .mapValues { values =>
    values
      .map { case MyClass(_, _, priceLocal, priceConv) => (priceLocal, priceConv) }
      .reduce((x, y) => (x._1 + y._1, x._2 + y._2))
  }
  .map { case ((date, book), (priceLocal, priceConv)) =>
    (date, book, priceLocal, priceConv)
  }
// List(
//   (Date(2019, 7, 8), "book1", 300, 340),
//   (Date(2019, 7, 8), "book2", 110, 130)
// )

This:

  • groups characters by date and book (groupBy)

  • maps each grouped values (mapValues) by:

    • mapping values as tuple of prices
    • and reducing these tuples by summing part by part
  • maps the map of tuple (date, book) to tuple (price, price) to tuple of 4 elements

Upvotes: 1

Related Questions