Totalize count column with grand total

Question

I want to add a row with grand total of previously grouped rows. I have code:

df_join = ( 
       df.join(df1, df.serialnumber == df1.entityid)
          .distinct()
         .groupBy("SW_version").count().show(truncate=False)

I need to add the grand total row, summing all values in a count column.

For now the result of the code is:

 +-----------+-----+
 |SW_version |count|
 +-----------+-----+
 |SG4J000078C|63   |
 |SG4J000092C|670  |
 |SG4J000094C|43227|
 +-----------+-----+

barteksch · Accepted Answer

You can use rollup instead of groupBy in this case. Rollup will produce one additional row with null group and aggregation for all rows.

For df like this:

+-------+
|version|
+-------+
|      A|
|      A|
|      B|
|      B|
|      B|
|      C|
+-------+

df.rollup("version").count().sort("version", ascending=False).show() will return:

+-------+-----+
|version|count|
+-------+-----+
|      C|    1|
|      B|    3|
|      A|    2|
|   null|    6| <-- this is the grand total
+-------+-----+

You can read more about rollup in this post What is the difference between cube, rollup and groupBy operators?

Totalize count column with grand total

Answers (1)

Related Questions