A B
A B

Reputation: 1936

Pivot on multiple columns dynamically in Spark Dataframe

This is what I am using for two pivot column in a Dataframe where I am concatenating two columns and then doing the transpose.

// Define a udf to concatenate two passed in string values
val concat = udf( (first: String, second: String) => { first + " " + second } )

def main (args: Array[String]) {

    // pivot using concatenated column
    domainDF.withColumn("combColumn", concat($"col1",$"col2"))
      .groupBy("someCol").pivot("combColumn").agg(count).show()

  }

My requirement is make this functionality generic, so any number of columns can be passed as variable argument for concatenation. Can anyone provide any solution for the requirement? Thanks

Upvotes: 1

Views: 1643

Answers (1)

Shaido
Shaido

Reputation: 28322

Use the built-in concatination function instead, it allows for a variable number of input columns. See the documentation.

In this case, you can do:

import org.apache.spark.sql.functions._

domainDF.withColumn("combColumn", concat(Seq($"col1", $"col2"):_*))
  .groupBy("someCol").pivot("combColumn").agg(count)

If you want to use a separator between the column values, use concat_ws. For example, to use a space: concat_ws(" ", Seq(...)).


If you need to use an UDF due to other concerns, it's possible to use a variable number of arguments by wrapping them in an array, see: Spark UDF with varargs

Upvotes: 2

Related Questions