Reputation: 31

How to use group by using multiple keys?

I have this document which I want to read and group the file by employee designation and department and find the average salary. Following is the code I used. I used map. How do I implement it using group by.

import scala.io.Source

object Problem {
    case class Employee(empId: String, 
                        designation: String, 
                        age: Int, 
                        salary: Long, 
                        department: Int)

    def main(arrg:Array[String]){
        var a = Source.fromFile("someFile.txt"). 
                        getLines(). 
                        map( _.split(",") ). 
                        map( l => ((l(1)+l(4)),l(3)) ). 
                        mapValues( _.map( _.salary ).sum/_.map.size )
       print(a)
    }
}

Upvotes: 0

Answers (3)

Odomontois

Reputation: 16308

Let me share some code from my private utils stash

given

libraryDependencies ++= Seq(
  "com.chuusai"        %% "shapeless"    % "2.2.3",
  "org.scalaz"         %% "scalaz-core"  % "7.1.1",
  "org.typelevel"      %% "scalaz-spire" % "0.2",
  "com.github.melrief" %% "purecsv"      % "0.0.2")

in the build.sbt

This import prefix:

import purecsv.safe._
import shapeless.tag.Tagger
import scala.{util => ut}
import scalaz._
import Scalaz._
import spire.implicits._
import shapeless._
import shapeless.syntax.singleton._
import ops.hlist.{Selector, RightFolder}

This handful of utils:

trait CorrespondingLow extends Poly2 {
  implicit def drop[E, L <: HList, L2 <: HList] = at[E, (L, Tagger[L2])] { case (_, (l, aux)) => (l, aux) }
}
object CorrespondingFolder extends CorrespondingLow {
  implicit def take[E, L <: HList, L2 <: HList]
  (implicit sel2: Selector[L2, E]) = at[E, (L, Tagger[L2])] { case (e, (l, aux)) => (e :: l, aux) }
}
class corresponding[R2] {
  def move[R1, L1 <: HList, L2 <: HList, L2A <: HList]
  (rec: R1)
  (implicit lgen1: LabelledGeneric.Aux[R1, L1],
   lgen2: LabelledGeneric.Aux[R2, L2],
   rf: RightFolder.Aux[L1, (HNil, Tagger[L2]), CorrespondingFolder.type, (L2A, Tagger[L2])],
   lgen2a: LabelledGeneric.Aux[R2, L2A]): R2 =
    lgen2a.from(lgen1.to(rec).foldRight((HNil: HNil, tag[L2]))(CorrespondingFolder)._1)
}
object corresponding {
  def apply[R2] = new corresponding[R2]
}

implicit class TryOps[T](t: ut.Try[T]) {
  def toValidation: ValidationNel[Throwable, T] = t match {
    case ut.Success(v) => v.success
    case ut.Failure(ex) => ex.failureNel
  }
}

And your model:

case class Employee(empId: String,
                    designation: String,
                    age: Int,
                    salary: Long,
                    department: Int)

case class Group(designation: String, department: Int)

We could easily write:

val file = getClass.getResource("employees.csv").getFile

val employees: ValidationNel[Throwable, Seq[Employee]] =
  CSVReader[Employee]
  .readCSVFromFileName(file)
  .traverseU(_.toValidation)

val averageSalary = (_: Seq[Employee])
  .groupBy(emp => corresponding[Group].move(emp))
  .mapValues {_
    .map(emp => BigDecimal(emp.salary))
    .qmean
  }

println(employees map averageSalary)

And get your grouped output.

Upvotes: 0

Mateusz Dymczyk

Reputation: 15141

Just groupBy a tuple:

Source.fromFile("someFile.txt").
    getLines().
    map( _.split(",") ).
    toSeq.
    map(data => Employee(data(0), data(1), data(2).toInt, data(3).toLong, data(4).toInt)).
    groupBy(emp => (emp.designation, emp.department)).
    mapValues(emp => emp.map(_.salary).sum / emp.length )

Upvotes: 0

Peter Neyens

Reputation: 9820

You can group by a tuple :

val employees = List(
  Employee("id", "des", 30, 1000, 1),
  Employee("id", "des2", 35, 1500, 1),
  Employee("id", "des", 40, 2000, 1)
)

employees
  .groupBy(e => (e.designation, e.department))
  .mapValues(emps => emps.map(_.salary).sum / emps.length)

// Map((des,1) -> 1500, (des2,1) -> 1500)

Upvotes: 1

How to use group by using multiple keys?

Answers (3)

Related Questions