Reputation: 14791
Pulled the latest SDK version (0.4.150414) from Maven, and our jobs are now failing.
We've traced it down to something with the deserialisation of a HashMap that is used in one of our classes, and which is referenced by the ParDo transformation.
Observations:
processElement
is invokedprocessElement
method shows that the HashMap has a different object ID (which must be from deserializing the original HashMap), but it is now empty i.e. all elements have been lost.Did anything change with the serialization/deserialization functionality in the latest version of the SDK?
Happy to send our code to the feedback email if you need it.
Upvotes: 3
Views: 97
Reputation: 6130
A change was made in the latest version to clone the DoFn when passed to a ParDo.of. This leads to better behavior if the DoFn is used multiple times, and modified in between uses.
The problem you describe would happen if the HashMap field was populated after the DoFn was passed to ParDo.of.
You can confirm this by setting a break point at ParDo.of and inspecting the state of the DoFn there. To fix this, initialize the field before invoking ParDo.of.
Hope this helps!
Upvotes: 5