koh-ding
koh-ding

Reputation: 135

How do I string concat two columns in Scala but order the resulting column alphabetically?

I have a dataframe like this...

  val new_df =Seq(("a","b"),("b","a"),("a","c")).toDF("col1","col2")

and I want to create "col3" which is a string concatenation of "col1" and "col2". However, I want the concatenation of "ab" and "ba" to be treated the same, sorted alphabetically so that it's only "ab".

The resulting dataframe I would like to look like this:

  val new_df =Seq(("a","b","ab"),("b","a","ab"),("a","c","ac")).toDF("col1","col2","col3")

And here's a before and after picture too:

before:

enter image description here

after:

enter image description here

thanks and have a great day!

Upvotes: 1

Views: 612

Answers (2)

Emiliano Martinez
Emiliano Martinez

Reputation: 4133

With Spark SQL functions to take advantage of the Spark SQL Optimizations:

import org.apache.spark.sql.functions.{sort_array, array, concat_ws}

new_df.withColumn("col3", 
  concat_ws("", 
    sort_array(array(col("col1"), col("col2")))))

Upvotes: 2

SCouto
SCouto

Reputation: 7926

You can just create an udf to create a sorted String

  val concatColumns = udf((c1: String, c2: String) => {
    List(c1, c2).sorted.mkString
  })

And then use it in a withColumn statement sending the desired columns to concatenate

 new_df.withColumn("col3", concatColumns($"col1", $"col2")).show(false)

Result

  +----+----+----+
    |col1|col2|col3|
    +----+----+----+
    |a   |b   |ab  |
    |b   |a   |ab  |
    |a   |c   |ac  |
    +----+----+----+

Upvotes: 1

Related Questions