Reputation: 45
If the given data frame is
A B C
1 0 0
3 0 1
4 0 8
5 0 0
How do we filter the above data frame such that only if all columns contain the value 0 except the first column.
A B C
3 0 1
4 0 8
Upvotes: 0
Views: 114
Reputation: 42422
Try checking each column invididually and combining the Booleans using greatest
:
import pyspark.sql.functions as F
df2 = df.filter(F.greatest(*[F.col(c) != 0 for c in df.columns[1:]]))
df2.show()
+---+---+---+
| A| B| C|
+---+---+---+
| 3| 0| 1|
| 4| 0| 8|
+---+---+---+
Upvotes: 2