Reputation: 1936
This is what I am using for two pivot column in a Dataframe where I am concatenating two columns and then doing the transpose.
// Define a udf to concatenate two passed in string values
val concat = udf( (first: String, second: String) => { first + " " + second } )
def main (args: Array[String]) {
// pivot using concatenated column
domainDF.withColumn("combColumn", concat($"col1",$"col2"))
.groupBy("someCol").pivot("combColumn").agg(count).show()
}
My requirement is make this functionality generic, so any number of columns can be passed as variable argument for concatenation. Can anyone provide any solution for the requirement? Thanks
Upvotes: 1
Views: 1643
Reputation: 28322
Use the built-in concatination function instead, it allows for a variable number of input columns. See the documentation.
In this case, you can do:
import org.apache.spark.sql.functions._
domainDF.withColumn("combColumn", concat(Seq($"col1", $"col2"):_*))
.groupBy("someCol").pivot("combColumn").agg(count)
If you want to use a separator between the column values, use concat_ws
. For example, to use a space: concat_ws(" ", Seq(...))
.
If you need to use an UDF
due to other concerns, it's possible to use a variable number of arguments by wrapping them in an array, see: Spark UDF with varargs
Upvotes: 2