Reputation: 101
I would like to implement the below SQL conditions in Pyspark
SELECT *
FROM table
WHERE NOT ( ID = 1
AND Event = 1
)
AND NOT ( ID = 2
AND Event = 2
)
AND NOT ( ID = 1
AND Event = 0
)
AND NOT ( ID = 2
AND Event = 0
)
What would be the clean way to do this?
Upvotes: 1
Views: 1938
Reputation: 42422
If you're lazy, you can just copy and paste the SQL filter expression into the pyspark filter:
df.filter("""
NOT ( ID = 1
AND Event = 1
)
AND NOT ( ID = 2
AND Event = 2
)
AND NOT ( ID = 1
AND Event = 0
)
AND NOT ( ID = 2
AND Event = 0
)
""")
Upvotes: 1
Reputation: 1058
you use filter or where function for DataFrame API version.
the equivalent code would be as follows :
df.filter(~((df.ID == 1) & (df.Event == 1)) &
~((df.ID == 2) & (df.Event == 2)) &
~((df.ID == 1) & (df.Event == 0)) &
~((df.ID == 2) & (df.Event == 0)))
Upvotes: 2