Fastest And Effective Way To Iterate Large DataSet in Java Spark

Question

I am converting spark dataset into list of hash maps by using below approach, My end goal is to build either list of json objects or list of hashmaps I am running this code on 3.2 millions of rows

List finalJsonMap = new ArrayList();
    srcData.foreachPartition(new ForeachPartitionFunction() {
        public void call(Iterator t) throws Exception {
            while (t.hasNext()){
                Row eachRow = t.next();
                HashMap rowMap = new HashMap();
                for(int j = 0; j < grpdColNames.size(); j++) {
                    rowMap.put(grpdColNames.get(j), eachRow.getString(j));  
                }
                finalJsonMap.add(rowMap);
            }
        }
    });

The iteration is working fine but I am unable to add rowMap into finalJsonMap.

What is the best approach to do this?

Fastest And Effective Way To Iterate Large DataSet in Java Spark

Answers (1)

Related Questions