hobiron
hobiron

Reputation: 11

groupby with spark java

i can read data from csv with spark, but i don't know how to groupBy with specific array. I want to groupBy 'Name'. This is my code :

public class readspark {
public static void main(String[] args) {
    final ObjectMapper om = new ObjectMapper();
    System.setProperty("hadoop.home.dir", "D:\\Task\\winutils-master\\hadoop-3.0.0");
    SparkConf conf = new SparkConf()
            .setMaster("local[3]")
            .setAppName("Read Spark CSV")
            .set("spark.driver.host", "localhost");
    JavaSparkContext jsc = new JavaSparkContext(conf);
    JavaRDD<String> lines = jsc.textFile("D:\\Task\\data.csv");
    JavaRDD<DataModel> rdd = lines.map(new Function<String, DataModel>() {
        @Override
        public DataModel call(String s) throws Exception {
            String[] dataArray = s.split(",");
            DataModel dataModel = new DataModel();
         
            dataModel.Name(dataArray[0]);
            dataModel.ID(dataArray[1]);
            dataModel.Addres(dataArray[2]);
            dataModel.Salary(dataArray[3]);
           
            return dataModel;
        }
    });
    rdd.foreach(new VoidFunction<DataModel>() {
                    @Override
                    public void call(DataModel stringObjectMap) throws Exception {
                        System.out.println(om.writeValueAsString(stringObjectMap));
                    }
                }
    );
}

Upvotes: 0

Views: 198

Answers (1)

Rishabh Sharma
Rishabh Sharma

Reputation: 862

Spark provides the group by functionality directly:

JavaPairRDD<String, Iterable<DataModel>> groupedRdd = rdd.groupBy(dataModel -> dataModel.getName());

This returns a pair rdd where the key is the Name (determined by the lambda provided to group by) and the value is data models with that name.

If you want to change the group by logic, all you need to do is provide corresponding lambda.

Upvotes: 1

Related Questions