Reputation: 464
Problem, please give any solutions in Java(not scala or python)
I have a DataFrame with the following data
colA, colB
23,44
24,64
What i want is a dataframe like this
colA, colB, colC
23,44, result of myFunction(23,24)
24,64, result of myFunction(23,24)
Basically i would like to add a column to the dataframe in java, where the value of the new column is found by putting the values of colA and colB through a complex function which returns a string.
Here is what i've tried, but the parameter to complexFunction only seems to be the name 'colA', rather than the value in colA.
myDataFrame.withColumn("ststs", (complexFunction(myDataFrame.col("colA")))).show();
Upvotes: 1
Views: 3471
Reputation: 1590
As suggested in the comments, you should use a User Defined Function. Let's suppose that you have a myFunction method which does the complex processing :
val myFunction : (Int, Int) => String = (colA, colB) => {...}
Then All you need to do is to transform your function into a udf and apply it on the columns A and B :
import org.apache.spark.sql.functions.{udf, col}
val myFunctionUdf = udf(myFunction)
myDataFrame.withColumn("colC", myFunctionUdf(col("colA"), col("colB")))
I hope it helps
Upvotes: 0