Reputation: 55
What is the difference between:
F.when(F.col('Name').isNull())
and:
F.when(F.col('Name') == None)
They don't appear to work the same. The first one seems to work better when checking for null values in a column. Sometimes the second method doesn't work for checking null Names. For example if I wanted to check null values and replace the Names that are null to "Missing name" or something, the second method won't do anything sometimes. I'm not sure why it works some times and not other times. But the first method always works.
Upvotes: 3
Views: 13354
Reputation: 1054
In data world, two Null
values (or for the matter two None
) are not identical.
Therefore, if you perform ==
or !=
operation with two None
values, it always results in False
. That is the key reason isNull()
or isNotNull()
functions are built for.
Please take a look at below example for better understanding -
Creating a dataframe
with few valid records and one record with None
from pyspark.sql.types import *
from pyspark.sql import Row
from pyspark.sql.functions import col
lst = [(1,'sometext'),(2,''),(3, None),(4, 'someothertext')]
myrdd = sc.parallelize(lst).map(lambda x: Row(id=x[0], txt=x[1]))
mydf= sqlContext.createDataFrame(myrdd)
isNull()
returns True
for Row#3, thus below statement returns one row -
mydf.filter(col("txt").isNull()).show(truncate=False)
+---+----+
|id |txt |
+---+----+
|3 |null|
+---+----+
==
operator returns False
for Row#3, thus no records are filtered out.
mydf.filter(col("txt") == None).show(truncate=False)
+---+---+
|id |txt|
+---+---+
+---+---+
Upvotes: 1
Reputation: 1712
Refer here : Filter Pyspark dataframe column with None value
Equality based comparisons with NULL won't work because in SQL NULL is undefined so any attempt to compare it with another value returns NULL
Upvotes: 2
Reputation: 1856
F.when(F.col('Name') == None)
None is an object in python, NoneType
. It's not the same as an empty string or 0. For this instance, you would want to use
F.when(F.col('Name') is None)
Using ==
you're checking to see if F.col('Name')
value equals the None object which is going to throw things off.
Upvotes: 2