Reputation: 3168
I have a simple problem, but can find an easy solution.
I noticed the following:
myDF.withColumn("newColumn", col("aNullableColumn"))
Then in the schema the newColumn
is becoming nullable, even if there are no null values in aNullableColumn
.
How to get newColumn
to be not nullable?
I googled a little bit, the only solution I found is to rewritte the schema and recreate the dataframe, but this isn't nice solution.
Upvotes: 2
Views: 3153
Reputation: 3344
If you are absolutely sure that your column has no null
values, you can do this to change the nullability property of your new column:
from pyspark.sql.functions import col, lit, coalesce
myDF.withColumn("newColumn", coalesce(col("aNullableColumn"), lit(0)))
And make sure to use correct data type inside the lit
function (the same data type as is your aNullableColumn
). Also be aware that if there is null
value, the coalesce
function will change it to the value you provide inside lit
.
The reason why this works is the way how coalesce
deals with nullable
property. This is taken directly from Spark source code:
Coalesce is nullable if all of its children are nullable, or if it has no children.
Here the second child is lit(0)
and this is not nullable
therefore the resulting column will not be nullable
either.
Upvotes: 2