Pyspark filtering based on column value data and applying condition

Question

I have a pyspark dataframe in this format.

out.show(5)
+----------------+--------+--
|ip_address| Device  | Count |
+----------------+--------+--
|2.3.4.5  |   Apple  |     6 |
|1.2.3.4  |  Samsung |     18|
|6.6.6.6  |   MI     |     8 |
|4.4.4.4  |   Samsung|     12|
|8.8.8.8  |   Apple  |     16|
|9.9.9.9  |   Samsung|      8|
+----------------+--------+---

I want to get the output where output adds result which meets both the conditions

For ‘Device’ as ‘Samsung’ the ‘Count’ >10
else for any other ‘Device’ Type if ‘Count’ > 8

Final output should be like this

+----------------+--------+--
|ip_address| Device  | Count |
+----------------+--------+--
|1.2.3.4   |  Samsung|   18  |
|4.4.4.4   |  Samsung|    12 |
|8.8.8.8   |   Apple |    16 |

So one way I can think of doing is something like this by filtering out the Device type and applying conditions but I want to know if we can do it using a if else and then concatenating both conditions output

frSamsung = out.filter(out["Device"].rlike("Samsung"))
 fpr=frSamsung.filter(frSamsung.Count > 10)

Rakesh Kumar · Accepted Answer

Basically here you need composite condition, count depends on device type with 2 different conditions -

from pyspark.sql import functions as F

df.where((
         ((F.col("device") == 'Samsung') & (F.col("count") > 10 )) | 
         ((F.col("device") != 'Samsung') & (F.col("count") > 8 )) 
)).show()

Pyspark filtering based on column value data and applying condition

Answers (2)

Related Questions