Reputation: 77
I am having an issue creating a new column from the ordered concatenation of two existing columns on a pyspark dataframe i.e.:
+------+------+--------+
| Col1 | Col2 | NewCol |
+------+------+--------+
| ORD | DFW | DFWORD |
| CUN | MCI | CUNMCI |
| LAX | JFK | JFKLAX |
+------+------+--------+
In other words, I want to grab Col1 and Col2, ordered them alphabetically and concatenate them.
Any suggestions?
Upvotes: 0
Views: 735
Reputation: 35249
Combine concat_ws
, array
and sort_array
from pyspark.sql.functions import concat_ws, array, sort_array
df = spark.createDataFrame(
[("ORD", "DFW"), ("CUN", "MCI"), ("LAX", "JFK")],
("Col1", "Col2"))
df.withColumn("NewCol", concat_ws("", sort_array(array("Col1", "Col2")))).show()
# +----+----+------+
# |Col1|Col2|NewCol|
# +----+----+------+
# | ORD| DFW|DFWORD|
# | CUN| MCI|CUNMCI|
# | LAX| JFK|JFKLAX|
# +----+----+------+
Upvotes: 4