Reputation: 826
I am trying to encrypt a column in my CSV
file. I am trying to do that using UDF. But I am getting compilation error. Here is my code :
import org.apache.spark.sql.functions.{col, udf}
val upperUDF1 = udf { str: String => Encryptor.aes(str) }
val rawDF = spark
.read
.format("csv")
.option("header", "true")
.load(inputPath)
rawDF.withColumn("id", upperUDF1("id")).show() //Compilation error.
I am getting the compilation error in the last line, am I using the incorrect syntax. Thanks in advance.
Upvotes: 3
Views: 1297
Reputation: 18525
In addition to the answer from SCouto, you could also register your udf as a Spark SQL function by
spark.udf.register("upperUDF2", upperUDF1)
Your subsequent select expression could then look like this
rawDF.selectExpr("id", "upperUDF2(id)").show()
Upvotes: 1
Reputation: 7926
You should send a Column
not a String
, you can reference to a column by different syntaxes:
$"<columnName>"
col("<columnName>")
So you should try this:
rawDF.withColumn("id", upperUDF1($"id")).show()
or this:
rawDF.withColumn("id", upperUDF1(col("id"))).show()
Personally i like the dollar syntax the most, seems more elegant to me
Upvotes: 3