user4046073
user4046073

Reputation: 871

How to apply functions in Scala case class for transforming dataframes

Quite new here. I'm trying to convert dataframe(with 2 columns a and b) to case class, use a function mathAdd on column A, and put the column in a new column C. I know the function .withColumn but really I don't know how to put these together. Below is my attempt with comments. Could anyone please help? Many thanks. *edited: One of the reasons why I want to use case class is because I'd like to save those functions for reuse.

  dfTest.createOrReplaceTempView("testTable") 

  case class testclass (a:Int,b:String){
     var result = 0    
     def mathAdd ={
        if (b=="apple"){
           result=a+1
        } else{
           result=a+2
   // but how to put 'var result' into a column? 
     }
   }  
 }

 var toTestClass = sqlContext.table("testTable").as[testclass] 
 toTestClass.mathAdd()
 //After this how can I convert this testclass back to dataframe?  

Upvotes: 1

Views: 1240

Answers (2)

Raphael Roth
Raphael Roth

Reputation: 27373

you can just invoke your instance method in map :

case class testclass(a: Int, b: String) {
    var result = 0

    def mathAdd: Int = {
      if (b == "apple") {
        result = a + 1
      } else {
        result = a + 2
      }
      return result
    }
  }

val tansformed = sqlContext.table("testTable").as[testclass].map(tc => tc.mathAdd)

This will get you a Dataset[Int]

But I would rather define mathAdd as a separate method, normally case classes are not thought to contain logic:

case class testclass(a: Int, b: String)

def mathAdd(tc: testclass): Int = {
  if (tc.b == "apple") {
    tc.a + 1
  } else {
    tc.a + 2
  }
}

val tansformed = sqlContext.table("testTable").as[testclass].map(tc => mathAdd(tc))

Upvotes: 1

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41987

You can achieve what you intend to do with case class with simple when function and withColumn api

import org.apache.spark.sql.functions._    
df.withColumn("newCol", when(col("b") === "apple", col("a")+1) otherwise(col("a")+2))

So I guess you don't need a case class.

Upvotes: 1

Related Questions