Add value with groupByKey

Question

I have some troubles with groupByKey in scala and Spark. I have 2 case classes :

case class Employee(id_employee: Long, name_emp: String, salary: String)

For the moment I use this 2nd case class:

case class Company(id_company: Long, employee:Seq[Employee])

However, I want to replace it with this new one:

case class Company(id_company: Long, name_comp: String employee:Seq[Employee])

There is a parent DataSet (df1) that I use with groupByKey to create Company objects :

val companies = df1.groupByKey(v => v.id_company)
.mapGroups(
  {
    case(k,iter) => Company(k, iter.map(x => Employee(x.id_employee, x.name_emp, x.salary)).toSeq)
  }
).collect()

This code works, it returns objects like this one :

Company(1234,List(Employee(0987, John, 30000),Employee(4567, Bob, 50000)))

But I don't find the tip to add the Company name_comp to those objects (this field exist df1). In order to retrieve objects like this (using the new case class):

Company(1234, NYTimes, List(Employee(0987, John, 30000),Employee(4567, Bob, 50000)))

Shaido · Accepted Answer

Since you want both the company id and name, what you can do is to use a tuple as the key when you group your data. This will make both values easily available when constructing the Company class:

df1.groupByKey(v => (v.id_company, v.name_comp))
  .mapGroups{ case((id, name), iter) => 
    Company(id, name, iter.map(x => Employee(x.id_employee, x.name_emp, x.salary)).toSeq)}
  .collect()

Add value with groupByKey

Answers (1)

Related Questions