Reputation: 3715
How could I duplicate row according to this
|source_ip |dest_ip |source_port|dest_port|
|192.168.1.1|10.0.0.1|5343 |22 |
Into
|ip |source_port|dest_port|
|192.168.1.1|5343 |22 |
|10.0.0.1 |5343 |22 |
Using pyspark?
Upvotes: 1
Views: 38
Reputation: 31490
Try with array
and explode
.
Example:
df.show()
#+-----------+--------+-----------+---------+
#| ip| dest_ip|source_port|dest_port|
#+-----------+--------+-----------+---------+
#|192.168.1.1|10.0.0.1| 5343| 22|
#+-----------+--------+-----------+---------+
df.withColumn("arr",array(col("ip"),col("dest_ip"))).\
selectExpr("explode(arr) as ip","source_port","dest_port").\
show()
#+-----------+-----------+---------+
#| ip|source_port|dest_port|
#+-----------+-----------+---------+
#|192.168.1.1| 5343| 22|
#| 10.0.0.1| 5343| 22|
#+-----------+-----------+---------+
Upvotes: 1