Reputation: 1106
I wrote this :
val kind: UserDefinedFunction = udf((col1: Int, col2: Float, col3: Float) => {
if (col1 < 11) {
BREAKFAST
} else if (col1 >= 11 && col1 <= 14 && (col2 > 2 || col3 > 0)) {
LUNCH
}
else if (col1 > 14 && col1 < 18) {
SNACK
}
else if (col1 >= 18) {
DINNER
}
else {
OTHER
}
but when I applied to my dataframe I get null values for the kind column although the input column is not null
When I apply it on an input dataframe and select few cols, I got :
MEAL|INT_HOUR|
+------------+--------+
| null| 15|
| null| 15|
| null| 15|
| null| 18|
| null| 17|
| null| 14|
| null| 11|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| LUNCH| 13|
| null| 11|
| null| 14|
| null| 15|
| null| 15|
| null| 14|
| null| null|
| null| 11
Any idea how to fix this please ?
Thanks
Upvotes: 1
Views: 85
Reputation: 10246
This is because your Data has null values.
You need to change the type of parameter from Int
to Integer
.
Let us see examples.
scala> df.printSchema
root
|-- age: integer (nullable = true)
|-- height: integer (nullable = true)
scala> df.show
+---+------+
|age|height|
+---+------+
| 10| 10|
| 10| null|
| 15| null|
+---+------+
val kind = udf((col1: Int, col2: Int) => {
if (col1 < 11) {
"case 1"
} else if (col1 >= 11 && col1 <= 14 && col2 > 2) {
"case 2"
}
else if (col1 > 14 && col1 < 18) {
"case 3"
}
else if (col1 >= 18) {
"case 4"
}
else {
"case 5"
}
}
)
I have 3 rows and the udf looks like yours.
When I call the udf, I get null results.
scala> df.withColumn("output", kind(df("age"), df("height"))).show
+---+------+------+
|age|height|output|
+---+------+------+
| 10| 10|case 1|
| 10| null| null|
| 15| null| null|
+---+------+------+
But, If I modify udf like this:
val kind = udf((col1: Integer, col2: Integer) => {
if (col1 < 11) {
"case 1"
} else if (col1 >= 11 && col1 <= 14 && col2 > 2) {
"case 2"
}
else if (col1 > 14 && col1 < 18) {
"case 3"
}
else if (col1 >= 18) {
"case 4"
}
else {
"case 5"
}
}
)
the output does not have null values, because Int
type cannot have null
but Integer
can.
scala> df.withColumn("output", kind(df("age"), df("height"))).show
+---+------+------+
|age|height|output|
+---+------+------+
| 10| 10|case 1|
| 10| null|case 1|
| 15| null|case 3|
+---+------+------+
Upvotes: 1