amcozmo
amcozmo

Reputation: 27

unpivoting the dataframe in spark and scala

I have a dataframe like:

+----------+-----+------+------+-----+---+
|   product|china|france|german|india|usa|
+----------+-----+------+------+-----+---+
|     beans|  496|   200|   210|  234|119|
|    banana| null|   345|   234|  123|122|
|starwberry|  340|   430|   246|  111|321|
|     mango| null|   345|   456|  110|223|
|     chiku|  765|   455|   666|  122|222|
|     apple|  109|   766|   544|  444|333|
+----------+-----+------+------+-----+---+

I want to unpivot it by keeping fixed as mutiple columns like

import spark.implicits._
    val unPivotDF = testData.select($"product",$"german", expr("stack(4, 'china', china, 'usa', usa, 'france', france,'india',india) " +
      "as (Country,Total)"))
    unPivotDF.show()

which gives below o/p:

+----------+------+-------+-----+
|   product|german|Country|Total|
+----------+------+-------+-----+
|     beans|   210|  china|  496|
|     beans|   210|    usa|  119|
|     beans|   210| france|  200|
|     beans|   210|  india|  234|
|    banana|   234|  china| null|
|    banana|   234|    usa|  122|
|    banana|   234| france|  345|
|    banana|   234|  india|  123|
|starwberry|   246|  china|  340|
|starwberry|   246|    usa|  321|
|starwberry|   246| france|  430|
|starwberry|   246|  india|  111|

which is perfect but this fixed column like product and german are runtime information so directly i cannot use the col names in select statement
So what i was doing

val fixedCol= List[String]()
fixedCol= "german" :: fixedCol
fixedCol= "product" :: fixedCol

val col= df.select(fixedCol:_*,expr("stack(.......)") //it gives error as first argument of select is fixed and second arg is varargs

I know it can be done by using but i cannot use sql:

val ss= spark.createOrReplaceTempView(df)
spark.sql("select.......")

Is there any other way to make it dynamic

Upvotes: 0

Views: 85

Answers (1)

QuickSilver
QuickSilver

Reputation: 4045

Convert all column names and exp to List[Column]

val fixedCol : List[Column] = List(col("german") , col("product") , expr("stack(.......)"))
    df.select(fixedCol:_*) 

Upvotes: 1

Related Questions