Gaurav Kumar
Gaurav Kumar

Reputation: 31

How to unpivot multiple columns in PySpark?

The columns CGL, CPL, EO should become Coverage Type, the values for CGL, CPL, EO should go in column Premium, and values for CGLTria,CPLTria,EOTria should go in column Tria Premium.

declare @TestDate table  ( 
                            QuoteGUID varchar(8000), 
                            CGL money, 
                            CGLTria money, 
                            CPL money,
                            CPLTria money,
                            EO money,
                            EOTria money
                            )

INSERT INTO @TestDate (QuoteGUID, CGL, CGLTria, CPL, CPLTria, EO, EOTria)
VALUES ('2D62B895-92B7-4A76-86AF-00138C5C8540', 2000, 160, 674, 54, 341, 0),
       ('BE7F9483-174F-4238-8931-00D09F99F398', 0, 0, 3238, 259, 0, 0),
       ('BECFB9D8-D668-4C06-9971-0108A15E1EC2', 0, 0, 0, 0, 0, 0)

SELECT * FROM @TestDate

OUTPUT:

1

The result should be like that:

2

I have done it using cross apply with values in SQL, but I want to implement it using PySpark.

Upvotes: 3

Views: 6202

Answers (1)

ZygD
ZygD

Reputation: 24508

Before usual unpivoting (using stack), you could combine groups of columns to one column using array.

Input:

from pyspark.sql import functions as F

df = spark.createDataFrame(
    [('2D62B895-92B7-4A76-86AF-00138C5C8540', 2000, 160, 674, 54, 341, 0),
     ('BE7F9483-174F-4238-8931-00D09F99F398', 0, 0, 3238, 259, 0, 0),
     ('BECFB9D8-D668-4C06-9971-0108A15E1EC2', 0, 0, 0, 0, 0, 0)],
    ['QuoteGUID', 'CGL', 'CGLTria', 'CPL', 'CPLTria', 'EO', 'EOTria']
)

Script:

df2 = df.select(
    'QuoteGUID',
    F.expr("stack(3, 'CGL', array(CGL, CGLTria), 'CPL', array(CPL, CPLTria), 'EO', array(EO, EOTria)) as (CoverageType, _P)")
)
df3 = df2.select(
    'QuoteGUID',
    'CoverageType',
    F.col('_P')[0].alias('Premium'),
    F.col('_P')[1].alias('TriaPremium'),
)

Result:

df3.show(truncate=0)
# +------------------------------------+------------+-------+-----------+
# |QuoteGUID                           |CoverageType|Premium|TriaPremium|
# +------------------------------------+------------+-------+-----------+
# |2D62B895-92B7-4A76-86AF-00138C5C8540|CGL         |2000   |160        |
# |2D62B895-92B7-4A76-86AF-00138C5C8540|CPL         |674    |54         |
# |2D62B895-92B7-4A76-86AF-00138C5C8540|EO          |341    |0          |
# |BE7F9483-174F-4238-8931-00D09F99F398|CGL         |0      |0          |
# |BE7F9483-174F-4238-8931-00D09F99F398|CPL         |3238   |259        |
# |BE7F9483-174F-4238-8931-00D09F99F398|EO          |0      |0          |
# |BECFB9D8-D668-4C06-9971-0108A15E1EC2|CGL         |0      |0          |
# |BECFB9D8-D668-4C06-9971-0108A15E1EC2|CPL         |0      |0          |
# |BECFB9D8-D668-4C06-9971-0108A15E1EC2|EO          |0      |0          |
# +------------------------------------+------------+-------+-----------+

Upvotes: 3

Related Questions