Reputation: 325
I have a dataset
+----------+--------+------------+
| id| date| errors|
+----------+--------+------------+
|1 |20170319| error1|
|1 |20170319| error2|
|1 |20170319| error2|
|1 |20170319| error1|
|2 |20170319| err6|
|1 |20170319| error2|
Need the number error counts day wise
output
+----------+--------+------------+
| date| errors| count
+----------+--------+------------+
|20170319| error1| 2
|20170319| error2| 3
|20170319| err6| 1
val dataset = spark.read.json(path);
val c =dataset.groupBy("date").count()
//how I proceed to count errors
I tried Windowing over date in spark scala sql but not able find productive do i need to convert to Rdd and find a approach.?
Upvotes: 3
Views: 717
Reputation: 49260
You just need to groupBy
both date
and errors
.
val c =dataset.groupBy("date","errors").count()
Upvotes: 1