convolutionBoy
convolutionBoy

Reputation: 831

Find max value of RDD with reduceByKey and then find associate value of a different variable

I have an RDD with 3 values

rdd = rdd.map(lambda x: (x['Id'],[float(x['value1']),int(x['value2'])]))

I want to find and return the entire RDD where value1 is maximised I know i could do

rddMax = rdd.map(lambda x: (x['Id'], int(x['value1']))).reduceByKey(max)

and then join it back but i just want one clean operation which finds max value of 2 grouped by the key and then return the entire RDD associated with these values.

I also do no want to put the data in dataframe under any circumstances

thanks

Upvotes: 0

Views: 2459

Answers (1)

user6022341
user6022341

Reputation:

Try this:

>>> rdd =  rdd.map(lambda x: 
...  (x['key'], (float(x['value1']), int(x['value2']))))
>>> rdd.reduceByKey(
... lambda (v11, v21), (v12,v22): (v11, v21) if v11 > v12 else (v12, v22))

Upvotes: 3

Related Questions