Reputation: 499
I have a pyspark dataframe in this format.
out.show(5)
+----------------+--------+--
|ip_address| Device | Count |
+----------------+--------+--
|2.3.4.5 | Apple | 6 |
|1.2.3.4 | Samsung | 18|
|6.6.6.6 | MI | 8 |
|4.4.4.4 | Samsung| 12|
|8.8.8.8 | Apple | 16|
|9.9.9.9 | Samsung| 8|
+----------------+--------+---
I want to get the output where output adds result which meets both the conditions
Final output should be like this
+----------------+--------+--
|ip_address| Device | Count |
+----------------+--------+--
|1.2.3.4 | Samsung| 18 |
|4.4.4.4 | Samsung| 12 |
|8.8.8.8 | Apple | 16 |
So one way I can think of doing is something like this by filtering out the Device type and applying conditions but I want to know if we can do it using a if else and then concatenating both conditions output
frSamsung = out.filter(out["Device"].rlike("Samsung"))
fpr=frSamsung.filter(frSamsung.Count > 10)
Upvotes: 0
Views: 655
Reputation: 15318
assuming df
is your dataframe:
from pyspark.sql import functions as F
df.where(
"""
Device = 'Samsung' and Count > 10
or Device <> 'Samsung' and count > 8
"""
).show()
Upvotes: 1
Reputation: 4430
Basically here you need composite condition, count depends on device type with 2 different conditions -
from pyspark.sql import functions as F
df.where((
((F.col("device") == 'Samsung') & (F.col("count") > 10 )) |
((F.col("device") != 'Samsung') & (F.col("count") > 8 ))
)).show()
Upvotes: 2