Reputation: 580
I am fetching Ip addresses from a log file and performing count on it,now i want to sort that JavaPairRDD based on its count value. you can refer below code.
JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer v1, Integer v2) throws Exception {
// TODO Auto-generated method stub
return v1 + v2;
}
});
the above JavaPairRDD will return IP count and now i want to sort it.. for eg output will be like this
(172.16.0.0,125)
(192.168.0.0,12)
(127.168.0.44,92)
2nd value is count of that particular ip.
Upvotes: 3
Views: 1170
Reputation: 1183
Sorting basing on the value is not supported by Spark. As a work around you could swap key and value pairs and then sort basing on the key.
check this : https://issues.apache.org/jira/browse/SPARK-3655
Swap key and value using this code :
JavaPairRDD<Integer, String> swapped = counts.mapToPair(new PairFunction<Tuple2<String, Integer>, Integer, String>() {
@Override
public Tuple2<Integer, String> call(Tuple2<String, Integer> item) throws Exception {
return item.swap();
}
});
Upvotes: 4