Reputation: 871
Quite new here. I'm trying to convert dataframe(with 2 columns a and b) to case class, use a function mathAdd
on column A, and put the column in a new column C. I know the function .withColumn
but really I don't know how to put these together. Below is my attempt with comments. Could anyone please help? Many thanks. *edited: One of the reasons why I want to use case class is because I'd like to save those functions for reuse.
dfTest.createOrReplaceTempView("testTable")
case class testclass (a:Int,b:String){
var result = 0
def mathAdd ={
if (b=="apple"){
result=a+1
} else{
result=a+2
// but how to put 'var result' into a column?
}
}
}
var toTestClass = sqlContext.table("testTable").as[testclass]
toTestClass.mathAdd()
//After this how can I convert this testclass back to dataframe?
Upvotes: 1
Views: 1240
Reputation: 27373
you can just invoke your instance method in map
:
case class testclass(a: Int, b: String) {
var result = 0
def mathAdd: Int = {
if (b == "apple") {
result = a + 1
} else {
result = a + 2
}
return result
}
}
val tansformed = sqlContext.table("testTable").as[testclass].map(tc => tc.mathAdd)
This will get you a Dataset[Int]
But I would rather define mathAdd
as a separate method, normally case classes are not thought to contain logic:
case class testclass(a: Int, b: String)
def mathAdd(tc: testclass): Int = {
if (tc.b == "apple") {
tc.a + 1
} else {
tc.a + 2
}
}
val tansformed = sqlContext.table("testTable").as[testclass].map(tc => mathAdd(tc))
Upvotes: 1
Reputation: 41987
You can achieve what you intend to do with case class
with simple when
function and withColumn
api
import org.apache.spark.sql.functions._
df.withColumn("newCol", when(col("b") === "apple", col("a")+1) otherwise(col("a")+2))
So I guess you don't need a case class
.
Upvotes: 1