Reputation: 135
I have a dataframe like this...
val new_df =Seq(("a","b"),("b","a"),("a","c")).toDF("col1","col2")
and I want to create "col3
" which is a string concatenation of "col1
" and "col2
". However, I want the concatenation of "ab" and "ba" to be treated the same, sorted alphabetically so that it's only "ab".
The resulting dataframe I would like to look like this:
val new_df =Seq(("a","b","ab"),("b","a","ab"),("a","c","ac")).toDF("col1","col2","col3")
And here's a before and after picture too:
before:
after:
thanks and have a great day!
Upvotes: 1
Views: 612
Reputation: 4133
With Spark SQL functions to take advantage of the Spark SQL Optimizations:
import org.apache.spark.sql.functions.{sort_array, array, concat_ws}
new_df.withColumn("col3",
concat_ws("",
sort_array(array(col("col1"), col("col2")))))
Upvotes: 2
Reputation: 7926
You can just create an udf
to create a sorted String
val concatColumns = udf((c1: String, c2: String) => {
List(c1, c2).sorted.mkString
})
And then use it in a withColumn
statement sending the desired columns to concatenate
new_df.withColumn("col3", concatColumns($"col1", $"col2")).show(false)
Result
+----+----+----+
|col1|col2|col3|
+----+----+----+
|a |b |ab |
|b |a |ab |
|a |c |ac |
+----+----+----+
Upvotes: 1