Aman Saurav
Aman Saurav

Reputation: 826

Encrypt a CSV column via UDF, Spark - Scala

I am trying to encrypt a column in my CSV file. I am trying to do that using UDF. But I am getting compilation error. Here is my code :

import org.apache.spark.sql.functions.{col, udf}

val upperUDF1 = udf { str: String => Encryptor.aes(str) }

val rawDF = spark
      .read
      .format("csv")
      .option("header", "true")
      .load(inputPath)

rawDF.withColumn("id", upperUDF1("id")).show() //Compilation error.

I am getting the compilation error in the last line, am I using the incorrect syntax. Thanks in advance. Error

Upvotes: 3

Views: 1297

Answers (2)

Michael Heil
Michael Heil

Reputation: 18525

In addition to the answer from SCouto, you could also register your udf as a Spark SQL function by

spark.udf.register("upperUDF2", upperUDF1)

Your subsequent select expression could then look like this

rawDF.selectExpr("id", "upperUDF2(id)").show()

Upvotes: 1

SCouto
SCouto

Reputation: 7926

You should send a Column not a String, you can reference to a column by different syntaxes:

$"<columnName>" 
col("<columnName>")

So you should try this:

rawDF.withColumn("id", upperUDF1($"id")).show()

or this:

rawDF.withColumn("id", upperUDF1(col("id"))).show()

Personally i like the dollar syntax the most, seems more elegant to me

Upvotes: 3

Related Questions