Reputation: 5595
I'm trying to map a function across a JavaRDD in spark, and I keep getting NotSerializableError
on the map
call.
public class SparkPrunedSet extends AbstractSparkSet {
private final ColumnPruner pruner;
public SparkPrunedSet(@JsonProperty("parent") SparkSet parent, @JsonProperty("pruner") ColumnPruner pruner) {
super(parent);
this.pruner = pruner;
}
public JavaRDD<Record> getRdd(SparkContext context) {
JavaRDD<Record> rdd = getParent().getRdd(context);
Function<Record, Record> mappingFunction = makeRecordTransformer(pruner);
//The line below throws the error
JavaRDD<Record> mappedRdd = rdd.map(mappingFunction);
return mappedRdd;
}
private Function<Record, Record> makeRecordTransformer() {
return new Function<Record, Record>() {
private static final long serialVersionUID = 1L;
@Override
public Record call(Record record) throws Exception {
// Obviously i'd like to do something more useful in here, but this is enough
// to throw the error
return record;
}
};
}
}
When it runs, I get: java.io.NotSerializableException: com.package.SparkPrunedSet
Record
is an interface that implements serializable, and MapRecord
is an implementation of it. Similar code to this exists and works in the codebase, except it's using rdd.filter
instead. I've read through most of the other stack overflow entries on this, and none of them seem to help. I thought it might have to do with troubles serializing SparkPrunedSet
(although I don't understand why it would even need to do this), so I set all of the fields on it to transient
, but that didn't help either. Does anyone have any ideas?
Upvotes: 0
Views: 327
Reputation: 34628
The Function
you are creating for the transformation is, in fact, an (anonymous) inner class of SparkPrunedSet
. Therefore every instance of that function has an implicit reference to the SparkPrunedSet
object that created it.
Therefore, serialization of it will require serialization of SparkPrunedSet
.
Upvotes: 2