rajeev_49
rajeev_49

Reputation: 13

Pass arguments to a udf from columns present in a list of strings

I have a list of strings which represent column names inside a dataframe. I want to pass the arguments from these columns to a udf. How can I do it in spark scala ?

   val actualDF = Seq(
             ("beatles", "help|hey jude","sad",4),
             ("romeo", "eres mia","old school",56)
            ).toDF("name", "hit_songs","genre","xyz")


   val column_list: List[String] = List("hit_songs","name","genre")

   // example udf
   val testudf = org.apache.spark.sql.functions.udf((s1: String, s2: String) => {
     // lets say I want to concat all values
   })


   val finalDF = actualDF.withColumn("test_res",testudf(col(column_list(0))))

From the above example, I want to pass my list column_list to a udf. I am not sure how can I pass a complete list of string representing column names. Though in case of 1 element I saw I can do it with col(column_list(0))). Please support.

Upvotes: 1

Views: 1134

Answers (2)

s.polam
s.polam

Reputation: 10362

hit_songs is of type Seq[String], You need to change first parameter of your udf to Seq[String].

scala> singersDF.show(false)
+-------+-------------+----------+
|name   |hit_songs    |genre     |
+-------+-------------+----------+
|beatles|help|hey jude|sad       |
|romeo  |eres mia     |old school|
+-------+-------------+----------+
scala> actualDF.show(false)
+-------+----------------+----------+
|name   |hit_songs       |genre     |
+-------+----------------+----------+
|beatles|[help, hey jude]|sad       |
|romeo  |[eres mia]      |old school|
+-------+----------------+----------+
scala> column_list
res27: List[String] = List(hit_songs, name)

Change your UDF like below.

// s1 is of type Seq[String]
val testudf = udf((s1:Seq[String],s2:String) => {
    s1.mkString.concat(s2)
})

Applying UDF

scala> actualDF
.withColumn("test_res",testudf(col(column_list.head),col(column_list.last)))
.show(false)
+-------+----------------+----------+-------------------+
|name   |hit_songs       |genre     |test_res           |
+-------+----------------+----------+-------------------+
|beatles|[help, hey jude]|sad       |helphey judebeatles|
|romeo  |[eres mia]      |old school|eres miaromeo      |
+-------+----------------+----------+-------------------+

Without UDF

scala> actualDF.withColumn("test_res",concat_ws("",$"name",$"hit_songs")).show(false) // Without UDF.
+-------+----------------+----------+-------------------+
|name   |hit_songs       |genre     |test_res           |
+-------+----------------+----------+-------------------+
|beatles|[help, hey jude]|sad       |beatleshelphey jude|
|romeo  |[eres mia]      |old school|romeoeres mia      |
+-------+----------------+----------+-------------------+

Upvotes: 1

Michael Heil
Michael Heil

Reputation: 18475

Replace

testudf(col(column_list(0)))

with

testudf(column_list: _*)

This will interpret the list as multiple individual input arguments.

Upvotes: 1

Related Questions