Pyspark Avoid Pivot Transformation To Dataframe - Pivot Alternative

Question

I have a datafrane to which I am applying a pivot transformation and I want to know if there is a way to have the same end result and avoid the pivot transformation. The dataframe looks like this:

|gender|         pro|week|        share|forecast|
+------+------------+----+-------------+--------+
|  Male|           A|  40|          0.2|   195.0|
|Female|           A|  40|         0.01|    38.0|
|  Male|           B|  40|         0.15|   733.0|
|Female|           B|  41|         0.15|   579.0|
|Female|           C|  41|         0.01|    38.0|

The expected output os the following:

|gender|      pro|week|    share_1|    share_10|    share_15|    share_20|
+------+---------+----+-----------+------------+------------+------------+
|  Male|        A|  40|        0.0|         0.0|         0.0|       195.0|
|Female|        A|  40|       38.0|         0.0|         0.0|         0.0|
|Female|        B|  41|        0.0|         0.0|       579.0|         0.0|
|Female|        C|  41|       38.0|         0.0|         0.0|         0.0|
|  Male|        B|  40|      191.0|       205.0|       733.0|       245.0|

At the moment I am implementing this:

df.groupBy(['gender','pro','week']).pivot("share").agg(first('forecast')).withColumnRenamed('0.01', 'share_1').withColumnRenamed('0.1', 'share_10').withColumnRenamed('0.15', 'share_15').withColumnRenamed('0.2', 'share_20')

Is there a have the same result without applying a pivot transformation?

Steven · Accepted Answer

performances are poor because you do not provide values for the share column.

cf. doc pivot(pivot_col, values=None)

Not providing values is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.

I can insure you that the current official implementation of pivot will always be better than anything you'll try by yourself. Just add your values and it will be good.

Pyspark Avoid Pivot Transformation To Dataframe - Pivot Alternative

Answers (1)

Related Questions