max.kuzmentsov
max.kuzmentsov

Reputation: 796

Apache Spark. UDF Column based on another column without passing it's name as argument.

There is DataSet with column firm, I'm adding another column to this DataSet - firm_id here's example:

private val firms: mutable.Map[String, Integer] = ...
private val firmIdFromCode: (String => Integer) = (code: String) => firms(code)
val firm_id_by_code: UserDefinedFunction = udf(firmIdFromCode)
...
val ds = dataset.withColumn("firm_id", firm_id_by_code($"firm"))

Is there a way to eliminate passing $"firm" as argument (this column is always present in DS). I am searching for something for this:

val ds = dataset.withColumn("firm_id", firm_id_by_code)

Upvotes: 3

Views: 222

Answers (1)

Davis Broda
Davis Broda

Reputation: 4125

You could supply the column it will be using when you define the udf.

val someUdf = udf{ /*udf code*/}.apply($"colName")

// Usage in dataset
val ds = dataset.withColumn("newColName",someUdf)

Upvotes: 4

Related Questions