In PySpark how to add a new column based upon substring of an existent column?

Question

Want to make use of a column of "ip" in a DataFrame, containing string of IP addresses, to add a new column called "ipClass" based upon the first part of IP "aaa.bbb.ccc.ddd" :

say, if aaa < 127, then "Class A" ; if aaa == 127, then "Loopback". Omitting the rest cases for convenience.

So originally have this dataframe :

+-----+---------------+
|index|      ip       |
+-----+---------------+
|    1| 10.xx.xx.xxx  |
|    2| 127.xxx.xxx.xx|
|    3| 100.xx.xxx.xx |
|    4| 9.xx.xxx.xx   |
+-----+---------------+

The new DataFrame I want to have is :

+-----+---------------+-----------+
|index|      ip       |   ipClass |
+-----+---------------+-----------+
|    1| 10.xx.xx.xxx  |  Class A  |
|    2| 127.xxx.xxx.xx|  Loopback |
|    3| 100.xx.xxx.xx |  Class A  |
|    4| 9.xx.xxx.xx   |  Class A  |
|    5| 129.xx.xxx.xx |  Class B  |
+-----+---------------+-----------+

Just wondering how the below question mark part should be look like ?

    .withColumn(""ipClass", when( _?_ < 127, "Class A").when( _?_ == 127, "Loopback" )."Loopback")

The ? part needs to take "aaa" from "aaa.bbb.ccc.ddd", and then convert to int.

Thank you in advance !

anky · Accepted Answer

You can also split on . , then get the first element and cast into integer and compare:

import pyspark.sql.functions as F

splitted_col = F.split("ip","\.")[0].cast("integer")
df.withColumn("ipClass",F.when(splitted_col < 127 ,"Class A")
                         .when(splitted_col == 127,"Loopback")
                         .when(splitted_col > 127 , "Class B").otherwise("Test")).show()

+-----+--------------+--------+
|index|            ip| ipClass|
+-----+--------------+--------+
|    1|  10.xx.xx.xxx| Class A|
|    2|127.xxx.xxx.xx|Loopback|
|    3| 100.xx.xxx.xx| Class A|
|    4|   9.xx.xxx.xx| Class A|
|    5| 129.xx.xxx.xx| Class B|
+-----+--------------+--------+

In PySpark how to add a new column based upon substring of an existent column?

Answers (2)

Related Questions