Mahima
Mahima

Reputation: 45

How to filter a dataframe in Pyspark

If the given data frame is

A B C
1 0 0
3 0 1
4 0 8
5 0 0

How do we filter the above data frame such that only if all columns contain the value 0 except the first column.

A B C
3 0 1
4 0 8

Upvotes: 0

Views: 114

Answers (1)

mck
mck

Reputation: 42422

Try checking each column invididually and combining the Booleans using greatest:

import pyspark.sql.functions as F

df2 = df.filter(F.greatest(*[F.col(c) != 0 for c in df.columns[1:]]))

df2.show()
+---+---+---+
|  A|  B|  C|
+---+---+---+
|  3|  0|  1|
|  4|  0|  8|
+---+---+---+

Upvotes: 2

Related Questions