Writing Custom Spark functions on Spark columns/ Dataframe

Question

I want to normalize Names of authors by removing the accents

Input:  orčpžsíáýd
Output: orcpzsiayd

The code below will allow me the achieve this. How ever I am not sure how i can do this using spark functions where my input is dataframe col.

def stringNormalizer(c : Column) = (
    import org.apache.commons.lang.StringUtils
    return StringUtils.stripAccents(c.toString)
)

The way i should be able to call it

val normalizedAuthor = flat_author.withColumn("NormalizedAuthor",      
stringNormalizer(df_article("authors")))

I have just started learning spark. So please let me know if there is a better way to achieve this without UDFs.

user6022341 · Accepted Answer

It requires an udf:

val stringNormalizer = udf((s: String) => StringUtils.stripAccents(s))

df_article.select(stringNormalizer(col("authors")))

Answers (2)