Reputation: 1307
I need to create a new column called hash_id
from uid
column of my dataframe, Below is my code:
//1.Define a hashing function
def calculate_hashid (uid: String) : BigInteger ={
val md = java.security.MessageDigest.getInstance("SHA-1")
val ha = new BigInteger( DatatypeConverter.printHexBinary(md.digest(uid.getBytes)), 16).mod(BigInteger.valueOf(10000))
return ha
}
//2.Convert function to UDF
val calculate_hashidUDF = udf(calculate_hashid)
//3.Apply udf on spark dataframe
val userAgg_Data_hashid = userAgg_Data.withColumn("hash_id", calculate_hashidUDF($"uid"))
I am getting error at udf(calculate_hashid)
saying
missing arguments for the method calculate_hashid(string)
I have gone through many examples online but could not resolve it, what am I missing here.
Upvotes: 1
Views: 1355
Reputation: 23109
You can register your udf
as
val calculate_hashidUDF = udf[String, BigInteger](calculate_hashidUDF)
You can also rewrite your udf as
def calculate_hashidUDF = udf(((uid: String) => {
val md = java.security.MessageDigest.getInstance("SHA-1")
new BigInteger( DatatypeConverter.printHexBinary(md.digest(uid.getBytes)), 16).mod(BigInteger.valueOf(10000))
}): String => BigInteger)
Or even without return type
def calculate_hashidUDF = udf((uid: String) => {
val md = java.security.MessageDigest.getInstance("SHA-1")
new BigInteger( DatatypeConverter.printHexBinary(md.digest(uid.getBytes)), 16).mod(BigInteger.valueOf(10000))
})
Upvotes: 2