Reputation: 1538
I have a doubt in using java collections in spark programs? I got to know the following from spark programming guide.
The first way to reduce memory consumption is to avoid the Java features that add overhead, such as pointer-based data structures and wrapper objects. There are several ways to do this:
Design your data structures to prefer arrays of objects, and primitive types, instead of the standard Java or Scala collection classes (e.g. HashMap). The fastutil library provides convenient collection classes for primitive types that are compatible with the Java standard library.
Does this mean we shouldn't use java collections instead we must go for array of objects? Is the following code fine?
Map<String, String> lookUpMap = getLkp(path);
final Broadcast<<Map<String, String>> lookupBrdcst = sparkContext.broadcast(lookUpMap);
Upvotes: 2
Views: 731
Reputation: 13902
This is fine, assuming the size of the HashMap isn't too large. If it gets large you would probably want to use a join.
Your code does have a slight syntax error:
final Broadcast<<Map<String, String>> lookupBrdcst = sparkContext.broadcast(lookUpMap);
should be:
final Broadcast<Map<String, String>> lookupBrdcst = sparkContext.broadcast(lookUpMap);
You can see Java collections used as broadcast variables in the Spark examples themselves:
This example uses List<String>
as a broadcast variable.
Upvotes: 1