Reputation: 27
I have a dataframe like:
+----------+-----+------+------+-----+---+
| product|china|france|german|india|usa|
+----------+-----+------+------+-----+---+
| beans| 496| 200| 210| 234|119|
| banana| null| 345| 234| 123|122|
|starwberry| 340| 430| 246| 111|321|
| mango| null| 345| 456| 110|223|
| chiku| 765| 455| 666| 122|222|
| apple| 109| 766| 544| 444|333|
+----------+-----+------+------+-----+---+
I want to unpivot it by keeping fixed as mutiple columns like
import spark.implicits._
val unPivotDF = testData.select($"product",$"german", expr("stack(4, 'china', china, 'usa', usa, 'france', france,'india',india) " +
"as (Country,Total)"))
unPivotDF.show()
which gives below o/p:
+----------+------+-------+-----+
| product|german|Country|Total|
+----------+------+-------+-----+
| beans| 210| china| 496|
| beans| 210| usa| 119|
| beans| 210| france| 200|
| beans| 210| india| 234|
| banana| 234| china| null|
| banana| 234| usa| 122|
| banana| 234| france| 345|
| banana| 234| india| 123|
|starwberry| 246| china| 340|
|starwberry| 246| usa| 321|
|starwberry| 246| france| 430|
|starwberry| 246| india| 111|
which is perfect but this fixed column like product and german are runtime information so directly i cannot use the col names in select statement
So what i was doing
val fixedCol= List[String]()
fixedCol= "german" :: fixedCol
fixedCol= "product" :: fixedCol
val col= df.select(fixedCol:_*,expr("stack(.......)") //it gives error as first argument of select is fixed and second arg is varargs
I know it can be done by using but i cannot use sql:
val ss= spark.createOrReplaceTempView(df)
spark.sql("select.......")
Is there any other way to make it dynamic
Upvotes: 0
Views: 85
Reputation: 4045
Convert all column names and exp to List[Column]
val fixedCol : List[Column] = List(col("german") , col("product") , expr("stack(.......)"))
df.select(fixedCol:_*)
Upvotes: 1