Reputation: 2074
I have the following function:
minTotal = numRDD.reduceByKey(min).collect()
maxTotal = numRDD.reduceByKey(max).collect()
A sample from my dataset that is acting strangely:
(18, [u'300.0', u'1000.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'1000.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0', u'300.0']
The min is reported as: 1000 and Max as 300
Very odd to me all my other key/values are reporting correctly except for this one. Not sure what is going on here.
Upvotes: 0
Views: 1471
Reputation: 2074
Forgot that they are unicode and they will be evaluating as strings not their numeric form. So you need to convert to float to get the correct answer.
Upvotes: 1