Reputation: 641
Version of apache beam is 2.15.0
.
In this code , class Airport is used as Key for KV instance and at the end, mean is calculated for each Airport instance .
c.output(KV.of(stats.airport, stats.timestamp));
But how does apache beam internally compare two keys and return if two instances are same or not ? Are two instances treated same if all the class members has same values ? Document does not mention about the comparison for two keys.
I appreciate if someone can help me out with understanding.
Upvotes: 0
Views: 405
Reputation: 7058
This is actually explained in the GroupByKey
transform docs, which is the operation done under the hood for a Mean
aggregation:
Two keys of type
K
are compared for equality not by regular JavaObject.equals
(java.lang.Object
), but instead by first encoding each of the keys using theCoder
of the keys of the inputPCollection
, and then comparing the encoded bytes. This admits efficient parallel evaluation. Note that this requires that theCoder
of the keys be deterministic (seeCoder.verifyDeterministic()
). If the keyCoder
is not deterministic, an exception is thrown at pipeline construction time.
Note that Mean
uses Combine.PerKey
which is a 'shorthand' for GroupByKey
+ Combine.GroupedValues
.
Upvotes: 1