sfendell
sfendell

Reputation: 5595

Why am I getting a NotSerializableException here?

I'm trying to map a function across a JavaRDD in spark, and I keep getting NotSerializableError on the map call.

public class SparkPrunedSet extends AbstractSparkSet {

    private final ColumnPruner pruner;

    public SparkPrunedSet(@JsonProperty("parent") SparkSet parent, @JsonProperty("pruner") ColumnPruner     pruner) {
        super(parent);
        this.pruner = pruner;
    }

    public JavaRDD<Record> getRdd(SparkContext context) {
        JavaRDD<Record> rdd = getParent().getRdd(context);
        Function<Record, Record> mappingFunction = makeRecordTransformer(pruner);

        //The line below throws the error
        JavaRDD<Record> mappedRdd = rdd.map(mappingFunction);
        return mappedRdd;
    }

    private Function<Record, Record> makeRecordTransformer() {
        return new Function<Record, Record>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Record call(Record record) throws Exception {
                // Obviously i'd like to do something more useful in here, but this is enough
                // to throw the error
                return record;
            }
        };
    }
}

When it runs, I get: java.io.NotSerializableException: com.package.SparkPrunedSet

Record is an interface that implements serializable, and MapRecord is an implementation of it. Similar code to this exists and works in the codebase, except it's using rdd.filter instead. I've read through most of the other stack overflow entries on this, and none of them seem to help. I thought it might have to do with troubles serializing SparkPrunedSet (although I don't understand why it would even need to do this), so I set all of the fields on it to transient, but that didn't help either. Does anyone have any ideas?

Upvotes: 0

Views: 327

Answers (1)

RealSkeptic
RealSkeptic

Reputation: 34628

The Function you are creating for the transformation is, in fact, an (anonymous) inner class of SparkPrunedSet. Therefore every instance of that function has an implicit reference to the SparkPrunedSet object that created it.

Therefore, serialization of it will require serialization of SparkPrunedSet.

Upvotes: 2

Related Questions