Reputation: 37
A simple example:
suppose the data RDD stores data in (int, double)
format.
When I perform a reduce, e.g. data.reduceByKey{ case(a,b) => a + b}
, a problem crosses my mind: What if the input data is big enough to make the accumulated value bigger than the maximum double value? Does Spark handle this problem?
Upvotes: 1
Views: 136
Reputation: 450
If, for example you are programming in Java, then this is not Spark's fault that Java has a maximum value for double. The developer needs to take steps to avoid this, if he or she believes that it is a possibility given the inputs. For example, by using BigDecimal instead of double.
Remember that Spark is not responsible for the behaviour of the types (int, double, whatever) that are stored in RDDs.
(EDIT) Note Patricia's comment below. The question and answer are still relevant though if you ignore the particular example given of double and BigDecimal
Upvotes: 3