Chris
Chris

Reputation: 3715

pyspark duplicate row from column

How could I duplicate row according to this

|source_ip  |dest_ip |source_port|dest_port|
|192.168.1.1|10.0.0.1|5343       |22       |

Into

|ip         |source_port|dest_port|
|192.168.1.1|5343       |22       |
|10.0.0.1   |5343       |22       |

Using pyspark?

Upvotes: 1

Views: 38

Answers (1)

notNull
notNull

Reputation: 31490

Try with array and explode.

Example:

df.show()
#+-----------+--------+-----------+---------+
#|         ip| dest_ip|source_port|dest_port|
#+-----------+--------+-----------+---------+
#|192.168.1.1|10.0.0.1|       5343|       22|
#+-----------+--------+-----------+---------+

df.withColumn("arr",array(col("ip"),col("dest_ip"))).\
selectExpr("explode(arr) as ip","source_port","dest_port").\
show()
#+-----------+-----------+---------+
#|         ip|source_port|dest_port|
#+-----------+-----------+---------+
#|192.168.1.1|       5343|       22|
#|   10.0.0.1|       5343|       22|
#+-----------+-----------+---------+

Upvotes: 1

Related Questions