rwxrwxr--
rwxrwxr--

Reputation: 45

Pivoting a single row Spark dataframe with pivot

I am new to spark and I want, using scala, to pivot a single row of a data frame as the following:

+--------------+-------+-------+-------+-------+-------+-------+-------+
|       Country| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
|         Japan|      0|      4|     10|     18|     27|     31|     35|
+--------------+-------+-------+-------+-------+-------+-------+-------+

My pivoted data frame should look like the following

+--------------+-------+
|       Country| Japan |
+--------------+-------+
|        3/7/20|      0|
+--------------+-------+ 
|        3/8/20|      4|
+--------------+-------+
|        3/9/20|     10|
+--------------+-------+
|       3/10/20|     18|
+--------------+-------+
|           ...|    ...|
+--------------+-------+

I have tried using the following, but I am not sure I get the aggregation expression correctly:

val pivoted = df.groupBy("Country").pivot("Country", Seq("Japan")).agg(col("Country"))

Upvotes: 0

Views: 234

Answers (1)

Som
Som

Reputation: 6323

try this-

Use stack

 df2.show(false)
    df2.printSchema()
    /**
      * +-------+------+------+------+-------+-------+-------+-------+
      * |Country|3/7/20|3/8/20|3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
      * +-------+------+------+------+-------+-------+-------+-------+
      * |Japan  |0     |4     |10    |18     |27     |31     |35     |
      * +-------+------+------+------+-------+-------+-------+-------+
      *
      * root
      * |-- Country: string (nullable = true)
      * |-- 3/7/20: integer (nullable = true)
      * |-- 3/8/20: integer (nullable = true)
      * |-- 3/9/20: integer (nullable = true)
      * |-- 3/10/20: integer (nullable = true)
      * |-- 3/11/20: integer (nullable = true)
      * |-- 3/12/20: integer (nullable = true)
      * |-- 3/13/20: integer (nullable = true)
      */
    val stringCol = df2.columns.map(c => s"'$c', cast(`$c` as string)").mkString(", ")
    val processedDF = df2.selectExpr(s"stack(${df2.columns.length}, $stringCol) as (col_1, col_2)")
    processedDF.show(false)
    /**
      * +-------+-----+
      * |col_1  |col_2|
      * +-------+-----+
      * |Country|Japan|
      * |3/7/20 |0    |
      * |3/8/20 |4    |
      * |3/9/20 |10   |
      * |3/10/20|18   |
      * |3/11/20|27   |
      * |3/12/20|31   |
      * |3/13/20|35   |
      * +-------+-----+
      */

Upvotes: 1

Related Questions