robi
robi

Reputation: 123

Apache Beam Pipeline - Serialization problem

My Beam pipeline seems to work properly, but produces a lot of warnings:

[direct-runner-worker] WARN org.apache.beam.sdk.util.MutationDetectors - Coder of type class org.apache.beam.sdk.coders.SerializableCoder has a #structuralValue method which does not return true when the encoding of the elements is equal. Element com.emarsys.dataflow.templates.MetricBundle@73d85187

The class looks like this:

@AllArgsConstructor
@Data
public class MetricBundle implements Serializable {
    private Long customerId;
    private String query;
    private Map<PrimaryKey, Metric> metrics;
}

The problem is with the Map. PrimaryKey and Metric classes are composed of simple types, and implement the Serializable interface.

The transform producing the PCollection<MetricBundle> is using .setCoder(SerializableCoder.of(MetricBundle.class));

Do I have to implement a custom coder in order to serialize Map properly? During testing, I found no conflicts in the result, all elements are handled properly.

Upvotes: 1

Views: 1884

Answers (1)

robertwb
robertwb

Reputation: 5104

This is warning that due to the use of SerializableCoder it's not able to check certain properties of your pipeline. (SerializableCoder is also very inefficient.) I would recommend using a more specific coder here, but you don't have to necessarily write one of your own. E.g. it looks like you could decorate your class with @DefaultSchema(JavaFieldSchema.class). See https://beam.apache.org/documentation/programming-guide/#schemas

Upvotes: 2

Related Questions