Pivoting DataFrame - Spark SQL

Question

I have a DataFrame containing below:

TradeId|Source
ABC|"USD,333.123,20170605|USD,-789.444,20170605|GBP,1234.567,20150602"

I want to pivot this data so it turns into below

TradeId|CCY|PV
ABC|USD|333.123
ABC|USD|-789.444
ABC|GBP|1234.567

The number of CCY|PV|Date triplets in the column "Source" is not fixed. I could do it in ArrayList but that requires to load the data in JVM and defeats the whole point of Spark.

Lets say my DataFrame looks as below:

DataFrame tradesSnap = this.loadTradesSnap(reportRequest);
String tempTable = getTempTableName();
tradesSnap.registerTempTable(tempTable);
tradesSnap = tradesSnap.sqlContext().sql("SELECT TradeId, Source FROM " + tempTable);

Ramesh Maharjan · Accepted Answer

If you read databricks pivot, it says " A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns." And this is not what you desire I guess

I would suggest you to use withColumn and functions to get the final output you desire. You can do as following considering dataframe is what you have

+-------+----------------------------------------------------------------+
|TradeId|Source                                                          |
+-------+----------------------------------------------------------------+
|ABC    |USD,333.123,20170605|USD,-789.444,20170605|GBP,1234.567,20150602|
+-------+----------------------------------------------------------------+

You can do the following using explode, split and withColumn to get the desired output

val explodedDF = dataframe.withColumn("Source", explode(split(col("Source"), "\|")))
val finalDF = explodedDF.withColumn("CCY", split($"Source", ",")(0))
  .withColumn("PV", split($"Source", ",")(1))
  .withColumn("Date",  split($"Source", ",")(2))
  .drop("Source")

finalDF.show(false)

The final output is

+-------+---+--------+--------+
|TradeId|CCY|PV      |Date    |
+-------+---+--------+--------+
|ABC    |USD|333.123 |20170605|
|ABC    |USD|-789.444|20170605|
|ABC    |GBP|1234.567|20150602|
+-------+---+--------+--------+

I hope this solves your issue

Pivoting DataFrame - Spark SQL

Answers (2)

Related Questions