user1571307
user1571307

Reputation: 335

How to extract String from a Column datatype in Spark Scala?

I have a function which accepts a String parameter and does a "match" on it to determine return values, like this -

Edit(Complete function):

 def getSubscriptionDaysFunc(account_status:Column, created_at: org.apache.spark.sql.Column, updated_at: org.apache.spark.sql.Column):org.apache.spark.sql.Column = {
account_status match {
    case "expired" =>datediff(updated_at,created_at)
    case "cancelled" =>datediff(updated_at,created_at)
    case "active" =>datediff(updated_at, current_date())
    case default => null 
} }  

This function is called in this way -

df.withColumn("subscription_days", getSubscriptionDaysFunc($"account_status",$"created_at",$"updated_at"))

Here $"account_status" returns a "Column" value. How do I get the String value from the "Column" object?

Edit: I also tried writing a UDF in the following way -

val getSubscriptionDaysFunc = udf((account_status:String, created_at: org.apache.spark.sql.Column, updated_at: org.apache.spark.sql.Column):Column =>  {
account_status match {
case "expired" =>datediff(updated_at,created_at)
case "cancelled" =>datediff(updated_at,created_at)
case "active" => datediff(updated_at, current_date())
case default => null
} })

This gives the error -

"error: illegal start of declaration account_status match {" .

Upvotes: 2

Views: 1484

Answers (1)

Raphael Roth
Raphael Roth

Reputation: 27373

I think what you want to do is to implement an UDF:

import org.apache.spark.sql.functions.udf

val getSubscriptionDaysFunc = udf((account_status:String) =>  {
  account_status match {
    case "expired" =>//some logic
    case "cancelled" =>//some logic
    case "active" =>//some logic
    case default => null
  } 
})

df.withColumn("subscription_days", getSubscriptionDaysFunc($"account_status"))

Upvotes: 1

Related Questions