Reputation: 165
I have a dataframe and I want to filter data as below condition.
df = spark.createDataFrame([(101, 1, 2, 12, 34), (102, 1, 1, 45, 12), (103, 1, 34, 10, 0), (104, None, 2, 91, 76), (
105, 2, 67, 23, 54), (107, 34, 255, 234, 99), (109, None, 1, 23, 55)], ['ID', 'A', 'B', 'C', 'D'])
df.show()
+---+----+---+---+---+
| ID| A| B| C| D|
+---+----+---+---+---+
|101| 1| 2| 12| 34|
|102| 1| 1| 45| 12|
|103| 1| 34| 10| 0|
|104|null| 2| 91| 76|
|105| 2| 67| 23| 54|
|107| 34|255|234| 99|
|109|null| 1| 23| 55|
+---+----+---+---+---+
filtering condition
Expected output
+---+----+---+---+---+
| ID| A| B| C| D|
+---+----+---+---+---+
|101| 1| 2| 12| 34|
|102| 1| 1| 45| 12|
|103| 1| 34| 10| 0|
|109|null| 1| 23| 55|
+---+----+---+---+---+
My code:
import pyspark.sql.functions as F
df = df.filter((F.col('A') ==1 )& (F.col('A').isNull()) & (F.col('B') ==1))
df.show()
I am getting empty dataframe. Can someone help me. Thanks.
Upvotes: 0
Views: 2651
Reputation: 215127
You need |
condition instead of &
between condition 1
and 2
:
import pyspark.sql.functions as F
df.filter((F.col('A') == 1 ) | (F.col('A').isNull() & (F.col('B') ==1)))
Upvotes: 3