Group - Count from 2 columns from a Dataframe - Spark Java

Question

I have a Spark Dataframe with 2 columns as below.

Date      | Area
1/1/2016  |  1
3/1/2016  |  4
1/1/2016  |  1
5/1/2016  |  2
1/1/2016  |  3
1/1/2016  |  1
3/1/2016  |  4
1/1/2016  |  2
3/1/2016  |  3
3/1/2016  |  3
1/1/2016  |  4
1/1/2016  |  4
1/1/2016  |  2

And I want an output as

 Day: 1/1/2016 -> There are 3 rows at Area1
               -> There are 2 rows at Area2
               -> There are 1 rows at Area3
               -> There are 2 rows at Area4
 Day: 3/1/2016 -> There are 0 rows at Area1
               -> There are 0 rows at Area2
               -> There are 2 rows at Area3
               -> There are 2 rows at Area4
 Day: 5/1/2016 -> ..........

My java 8 code for this is:

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.*;
import org.apache.spark.sql.*;

public class Main {

    public static void main(String[] args) {
        SparkConf conf = new SparkConf();
        conf.setAppName("My 1st Spark app");
        conf.setMaster("local[*]");
        JavaSparkContext sc = new JavaSparkContext(conf);
        SparkSession sparkSession = SparkSession.builder().sparkContext(sc.sc()).getOrCreate();

        Dataset df = sparkSession.read().option("header", true).option("inferSchema", "true").option("timestampFormat", "yyyy-MM-dd hh:mm:ss").csv("hdfs://quickstart.cloudera:8020//user//cloudera//fares.csv");
    Dataset df = df_date_column.groupBy("Date").count();

But I have a result grouped By date and not by areas. So how can I group by date and areas?

Group - Count from 2 columns from a Dataframe - Spark Java

Answers (1)

Related Questions