Reputation: 1329
I have Some object to be shared among partitions in apache spark. Below is the code snippet and problem i'm facing.
private static void processDataWithResult() throws IOException {
JavaRDD<Long> idRDD = createIdRDDUsingDb();
final MeasureReportingData measureReporingData = getMeasureReportingData(jobConfiguration);
resultRDD = idRDD.mapPartitions(new FlatMapFunction<Iterator<Long>, Boolean>() {
@Override
public Iterable<Boolean> call(Iterator<Long> idIterator) throws Exception {
MeasureReportingData mrd = measureReporingData;
final List<Boolean> dummyList = new ArrayList<>();
long minId = idIterator.next();
engine.processInBatch(minId, minId + BATCH_SIZE - 1);
return (Iterable<Boolean>) dummyList;
}
});
resultRDD.count();
}
I want to distribute measureReportingData
object to all the partitions?
I get serialization errors because MeasureReportingData
contains instance members that are not Serializable
. Simulation of the issue is specified in this question: How to serialize a Predicate<T> from Nashorn engine in java 8
Is there another way to share measureReportingData among partitions?
Upvotes: 0
Views: 355
Reputation: 11381
In order to share data between machines, the data has to be serialized at the source, transfer over network, and de-serialized at the destination. So you cannot transfer non-serializable objects.
If MeasureReportingData
is not serializable, you have to convert it into a serializable object, share that object then convert it back to MeasureReportingData
inside the function.
Upvotes: 1