paulo
paulo

Reputation: 59

Pyspark Avoid Pivot Transformation To Dataframe - Pivot Alternative

I have a datafrane to which I am applying a pivot transformation and I want to know if there is a way to have the same end result and avoid the pivot transformation. The dataframe looks like this:

|gender|         pro|week|        share|forecast|
+------+------------+----+-------------+--------+
|  Male|           A|  40|          0.2|   195.0|
|Female|           A|  40|         0.01|    38.0|
|  Male|           B|  40|         0.15|   733.0|
|Female|           B|  41|         0.15|   579.0|
|Female|           C|  41|         0.01|    38.0|

The expected output os the following:

|gender|      pro|week|    share_1|    share_10|    share_15|    share_20|
+------+---------+----+-----------+------------+------------+------------+
|  Male|        A|  40|        0.0|         0.0|         0.0|       195.0|
|Female|        A|  40|       38.0|         0.0|         0.0|         0.0|
|Female|        B|  41|        0.0|         0.0|       579.0|         0.0|
|Female|        C|  41|       38.0|         0.0|         0.0|         0.0|
|  Male|        B|  40|      191.0|       205.0|       733.0|       245.0|

At the moment I am implementing this:

df.groupBy(['gender','pro','week']).pivot("share").agg(first('forecast')).withColumnRenamed('0.01', 'share_1').withColumnRenamed('0.1', 'share_10').withColumnRenamed('0.15', 'share_15').withColumnRenamed('0.2', 'share_20')

Is there a have the same result without applying a pivot transformation?

Upvotes: 1

Views: 395

Answers (1)

Steven
Steven

Reputation: 15318

performances are poor because you do not provide values for the share column.

cf. doc pivot(pivot_col, values=None)

Not providing values is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.

I can insure you that the current official implementation of pivot will always be better than anything you'll try by yourself. Just add your values and it will be good.

Upvotes: 2

Related Questions