Ranking a pivoted column

Question

I want to pivot a column and then rank the data from the pivoted column. Here is sample data:

| id | objective | metric      | score |
|----|-----------|-------------|-------|
| 1  | Sales     | Total Sales | 10    |
| 1  | Marketing | Total Reach | 4     |
| 2  | Sales     | Total Sales | 2     |
| 2  | Marketing | Total Reach | 11    |
| 3  | Sales     | Total Sales | 9     |

This would be my expected output after pivot + rank:

| id |  Sales | Marketing |
|----|--------|-----------|
| 1  | 1      | 2         |
| 2  | 3      | 1         |
| 3  | 2      | 3         |

The ranking is based on sum(score) from each objective. An objective can have also have multiple metrics but that isn't included in the sample for simplicity.

I have been able to successfully pivot and count the scores like so:

pivot = (
    spark.table('scoring_table')
    .select('id', 'objective', 'metric', 'score')
    .groupBy('id')
    .pivot('objective')
    .agg(
        sf.sum('score').alias('score')
    )

This then lets me see the total score per objective, but I'm unsure how to rank these. I have tried the following after aggregation:

.withColumn('rank', rank().over(Window.partitionBy('id', 'objective').orderBy(sf.col('score').desc())))

However objective is no longer callable from this point as it has been pivoted. I then tried this instead:

.withColumn('rank', rank().over(Window.partitionBy('id', 'Sales', 'Marketing').orderBy(sf.col('score').desc())))

But also the score column is no longer available. How can I rank these scores after pivoting the data?

mck · Accepted Answer

You just need to order by the score after pivot:

from pyspark.sql import functions as F, Window

df2 = df.groupBy('id').pivot('objective').agg(F.sum('score')).fillna(0)

df3 = df2.select(
    'id', 
    *[F.rank().over(Window.orderBy(F.desc(c))).alias(c) for c in df2.columns[1:]]
)

df3.show()
+---+---------+-----+
| id|Marketing|Sales|
+---+---------+-----+
|  2|        1|    3|
|  1|        2|    1|
|  3|        3|    2|
+---+---------+-----+

Ranking a pivoted column

Answers (1)

Related Questions