Reputation: 665
I am converting spark dataset into list of hash maps by using below approach, My end goal is to build either list of json objects or list of hashmaps I am running this code on 3.2 millions of rows
List<HashMap> finalJsonMap = new ArrayList<HashMap>();
srcData.foreachPartition(new ForeachPartitionFunction<Row>() {
public void call(Iterator<Row> t) throws Exception {
while (t.hasNext()){
Row eachRow = t.next();
HashMap rowMap = new HashMap();
for(int j = 0; j < grpdColNames.size(); j++) {
rowMap.put(grpdColNames.get(j), eachRow.getString(j));
}
finalJsonMap.add(rowMap);
}
}
});
The iteration is working fine but I am unable to add rowMap into finalJsonMap.
What is the best approach to do this?
Upvotes: 3
Views: 1446
Reputation: 56
That's really not how Spark works.
The code which, is put in foreachPartition
is executed in a different context than original
List<HashMap> finalJsonMap = new ArrayList<HashMap>();
All you can do in such setup is to modify local copy.
This has been discussed multiple times on Stack Overflow and is described in detail in the official documentation in the Understanding Closures section.
Considering the required result (i.e. local collection) there is really nothing else you can do than converting your code to use mapPartitions
and collect
. That's however hardly efficient or idiomatic in Spark.
I'd strongly recommend rethinking your current design.
Upvotes: 3