scalacode
scalacode

Reputation: 1106

UDF spark scala compute result based on input range

I wrote this :

val kind: UserDefinedFunction = udf((col1: Int, col2: Float, col3: Float) => {
    if (col1 <  11) {
      BREAKFAST

    } else if (col1 >= 11 && col1 <= 14 && (col2 > 2 || col3 > 0)) {
      LUNCH
    }
    else if (col1 > 14 && col1 < 18) {
      SNACK
    }
    else if (col1 >= 18) {
      DINNER
    }
    else {
      OTHER
    }

but when I applied to my dataframe I get null values for the kind column although the input column is not null

When I apply it on an input dataframe and select few cols, I got :

  MEAL|INT_HOUR|
+------------+--------+
|        null|      15|
|        null|      15|
|        null|      15|
|        null|      18|
|        null|      17|
|        null|      14|
|        null|      11|
|        null|    null|
|        null|    null|
|        null|    null|
|        null|    null|
|        null|    null|
|       LUNCH|      13|
|        null|      11|
|        null|      14|
|        null|      15|
|        null|      15|
|        null|      14|
|        null|    null|
|        null|      11

Any idea how to fix this please ?

Thanks

Upvotes: 1

Views: 85

Answers (1)

Jason Heo
Jason Heo

Reputation: 10246

This is because your Data has null values.

You need to change the type of parameter from Int to Integer.

Let us see examples.

scala> df.printSchema
root
 |-- age: integer (nullable = true)
 |-- height: integer (nullable = true)


scala> df.show
+---+------+
|age|height|
+---+------+
| 10|    10|
| 10|  null|
| 15|  null|
+---+------+

val kind = udf((col1: Int, col2: Int) => {
    if (col1 <  11) {
      "case 1"
    } else if (col1 >= 11 && col1 <= 14 && col2 > 2) {
      "case 2"
    }
    else if (col1 > 14 && col1 < 18) {
      "case 3"
    }
    else if (col1 >= 18) {
      "case 4"
    }
    else {
      "case 5"
    }
  }
)

I have 3 rows and the udf looks like yours.

When I call the udf, I get null results.

scala> df.withColumn("output", kind(df("age"), df("height"))).show
+---+------+------+
|age|height|output|
+---+------+------+
| 10|    10|case 1|
| 10|  null|  null|
| 15|  null|  null|
+---+------+------+

But, If I modify udf like this:

val kind = udf((col1: Integer, col2: Integer) => {
    if (col1 <  11) {
      "case 1"
    } else if (col1 >= 11 && col1 <= 14 && col2 > 2) {
      "case 2"
    }
    else if (col1 > 14 && col1 < 18) {
      "case 3"
    }
    else if (col1 >= 18) {
      "case 4"
    }
    else {
      "case 5"
    }
  }
)

the output does not have null values, because Int type cannot have null but Integer can.

scala> df.withColumn("output", kind(df("age"), df("height"))).show
+---+------+------+
|age|height|output|
+---+------+------+
| 10|    10|case 1|
| 10|  null|case 1|
| 15|  null|case 3|
+---+------+------+

Upvotes: 1

Related Questions