Reputation: 124
Is there any way to prevent spark sql functions from nulling values?
For example I have the following dataframe
df.show
+--------------------+--------------+------+------------+
| Title|Year Published|Rating|Length (Min)|
+--------------------+--------------+------+------------+
| 101 Dalmatians| 01/1996| G| 103|
|101 Dalmatians (A...| 1961| G| 79|
|101 Dalmations II...| 2003| G| 70|
I want to apply spark sqls date_format
function to Year Published
column.
val sql = """date_format(`Year Published`, 'MM/yyyy')"""
val df2 = df.withColumn("Year Published", expr(sql))
df2.show
+--------------------+--------------+------+------------+
| Title|Year Published|Rating|Length (Min)|
+--------------------+--------------+------+------------+
| 101 Dalmatians| null| G| 103|
|101 Dalmatians (A...| 01/1961| G| 79|
|101 Dalmations II...| 01/2003| G| 70|
The first row of the Year Published
column has been nulled as the original value was in a different date format than the other dates.
This behaviour is not unique to date_format
for example format_number
will null non-numeric types.
With my dataset I expect different date formats and dirty data with unparseable values. I have a use case where if the value of a cell cannot be formatted then I want to return the current value as opposed to null.
Is there a way to make spark use the original value in df
instead of null if the function for df2
cannot be applied correctly?
What I've tried
I've looked at wrapping Expressions in org.apache.spark.sql.catalyst.expressions
but could not see a way to replace the existing functions.
The only working solution I could find is creating my own date_format
and registering it as a udf but this isn't practical for all functions. I'm looking for a solution that will never return null if the input to a function is non-null or an automated way to wrap all existing spark functions.
Upvotes: 1
Views: 352
Reputation: 22497
You could probably use the coalesce
function for your purposes:
coalesce(date_format(`Year Published`, 'MM/yyyy'), `Year Published`)
Upvotes: 1