Reputation: 63022
My intention is to do the equivalent of the basic sql
select shipgrp, shipstatus, count(*) cnt
from shipstatus group by shipgrp, shipstatus
The examples that I have seen for spark dataframes include rollups by other columns: e.g.
df.groupBy($"shipgrp", $"shipstatus").agg(sum($"quantity"))
But no other column is needed in my case shown above. So what is the syntax and/or method call combination here?
Update A reader has suggested this question were a duplicate of dataframe: how to groupBy/count then filter on count in Scala : but that one is about filtering by count
: there is no filtering here.
Upvotes: 15
Views: 34061
Reputation: 214927
You can similarly do count("*")
in spark agg
function:
df.groupBy("shipgrp", "shipstatus").agg(count("*").as("cnt"))
val df = Seq(("a", 1), ("a", 1), ("b", 2), ("b", 3)).toDF("A", "B")
df.groupBy("A", "B").agg(count("*").as("cnt")).show
+---+---+---+
| A| B|cnt|
+---+---+---+
| b| 2| 1|
| a| 1| 2|
| b| 3| 1|
+---+---+---+
Upvotes: 24