Access to WrappedArray DataFrame's elements and apply a UDF function

Question

I'm looking for a way to apply a function to my DataFrame using UDF. My DataFrame looks like this:

+--------------------+-----+
|               TOPIC|COUNT|
+--------------------+-----+
|           [outlook]|   71|
|      [AppsOnDemand]|   12|
|  [OUTLOOK, OUTLOOK]|    1|
|             [SkyPe]|    3|
|       [Citrix, VPN]|    1|
|            [Citrix]|   31|
|               [VPN]|   51|
|      [PANDA, panda]|    1|
|      [SKYPE, SKYPE]|    2|
|             [panda]|    5|
|             [Cisco]|   75|
|       [télétravail]|   14|
|               [vpn]|    4|
|           [OUTLOOK]|  212|
|[SKYPE, télétravail]|    2|
|      [appsondemand]|    1|
|              [WIFI]|    5|
|      [CISCO, CISCO]|    4|
|              [MOOC]|    2|
|      [PANDA, Panda]|    1|
+--------------------+-----+

My objective is to loop over lists in the "TOPIC" column and change strings from lowercase to uppercase. So I need a simple Scala function which takes an array of strings as input and returns the uppercase version of those strings. To deal with only strings, it was very simple. I just did this:

import org.apache.spark.sql.functions.{array, col, count, lit, udf, upper}
DF.select($"COUNT", upper($"TOPIC")).show()

I was trying this, but it doesn't work:

def myFunc(context: Array[Seq[String]]) = udf {
  (topic: Seq[String]) => context.toString().toUpperCase
}

val Df = (df
  .where('TOPIC.isNotNull)
  .select($"TOPIC", $"COUNT",
    myFunc(context)($"TOPIC").alias("NEW_TOPIC"))
  )

Ramesh Maharjan · Accepted Answer

You should write a udf function as below

import org.apache.spark.sql.functions._
def upperUdf = udf((array: collection.mutable.WrappedArray[String])=> array.map(_.toUpperCase()))

and call it using withColumn as

df.withColumn("TOPIC", upperUdf($"TOPIC"))

You should get output as

+--------------------+-----+
|TOPIC               |COUNT|
+--------------------+-----+
|[OUTLOOK]           |71   |
|[APPSONDEMAND]      |12   |
|[OUTLOOK, OUTLOOK]  |1    |
|[SKYPE]             |3    |
|[CITRIX, VPN]       |1    |
|[CITRIX]            |31   |
|[VPN]               |51   |
|[PANDA, PANDA]      |1    |
|[SKYPE, SKYPE]      |2    |
|[PANDA]             |5    |
|[CISCO]             |75   |
|[TÉLÉTRAVAIL]       |14   |
|[VPN]               |4    |
|[OUTLOOK]           |212  |
|[SKYPE, TÉLÉTRAVAIL]|2    |
|[APPSONDEMAND]      |1    |
|[WIFI]              |5    |
|[CISCO, CISCO]      |4    |
|[MOOC]              |2    |
|[PANDA, PANDA]      |1    |
+--------------------+-----+

Access to WrappedArray DataFrame's elements and apply a UDF function

Answers (2)

Related Questions

Access to WrappedArray DataFrame&#39;s elements and apply a UDF function

Answers (2)

Related Questions

Access to WrappedArray DataFrame's elements and apply a UDF function