Chandra
Chandra

Reputation: 199

How to sort on key resulted by groupByKey in Spark

Need help in sorting on key after groupByKey output :

val skuRDD2:RDD[(String,Iterable[(String,imageinfo2))]= DF.select("ID", "TAG","MEDIA_ID","IMAGE_NAME","PATH").rdd
            .map(r => (r .getString(0),( r.getString(1),ImageInfo2(r.getString(2),r.getString(3),r.getString(4)))) )
      .groupByKey()

I want to sort on TAG ie. key in Iterable[(String,imageinfo2)) in above groupByKey output.

Input (above groupByKey output)-

(skuid,Map(largeImage_4 -> [Media/Device Images/Large Images/Huawei Images Large/GR5GRY-4,m110005,GR5GRY-4], largeImage_1 -> [Media/Device Images/Large Images/Huawei Images Large/GR5GRY-1,m110002,GR5GRY-1]) 

Expected output -

(skuid,Map(largeImage_1 -> [Media/Device Images/Large Images/Huawei Images Large/GR5GRY-1,m110002,GR5GRY-1], largeImage_4 -> [Media/Device Images/Large Images/Huawei Images Large/GR5GRY-4,m110005,GR5GRY-4]) 

Can someone help me.

Thanks,

Upvotes: 0

Views: 749

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41987

Analyzing your provided data made me conclude that all you are missing is simple mapValues function where you do the sorting.

.mapValues(x => x.toList.sortBy(y => y._1))

So your code should be

val skuRDD2:RDD[(String,Iterable[(String,ImageInfo2)])] = DF.select("ID", "TAG","MEDIA_ID","IMAGE_NAME","PATH").rdd
  .map(r => (r .getString(0),( r.getString(1),ImageInfo2(r.getString(2),r.getString(3),r.getString(4)))) )
  .groupByKey().mapValues(x => x.toList.sortBy(y => y._1))

I hope the answer is helpful

Upvotes: 3

Related Questions