Reputation: 3
I want to add a row with grand total of previously grouped rows. I have code:
df_join = (
df.join(df1, df.serialnumber == df1.entityid)
.distinct()
.groupBy("SW_version").count().show(truncate=False)
I need to add the grand total row, summing all values in a count column.
For now the result of the code is:
+-----------+-----+
|SW_version |count|
+-----------+-----+
|SG4J000078C|63 |
|SG4J000092C|670 |
|SG4J000094C|43227|
+-----------+-----+
Upvotes: 0
Views: 245
Reputation: 198
You can use rollup
instead of groupBy
in this case. Rollup will produce one additional row with null
group and aggregation for all rows.
For df
like this:
+-------+
|version|
+-------+
| A|
| A|
| B|
| B|
| B|
| C|
+-------+
df.rollup("version").count().sort("version", ascending=False).show()
will return:
+-------+-----+
|version|count|
+-------+-----+
| C| 1|
| B| 3|
| A| 2|
| null| 6| <-- this is the grand total
+-------+-----+
You can read more about rollup in this post What is the difference between cube, rollup and groupBy operators?
Upvotes: 1