Bhaumik Thakkar
Bhaumik Thakkar

Reputation: 580

How to perform Sort JavaPairRDD in apache spark

I am fetching Ip addresses from a log file and performing count on it,now i want to sort that JavaPairRDD based on its count value. you can refer below code.

JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {

        @Override
        public Integer call(Integer v1, Integer v2) throws Exception {
            // TODO Auto-generated method stub
            return v1 + v2;
        }
    });

the above JavaPairRDD will return IP count and now i want to sort it.. for eg output will be like this

(172.16.0.0,125)
(192.168.0.0,12)
(127.168.0.44,92)

2nd value is count of that particular ip.

Upvotes: 3

Views: 1170

Answers (1)

Rishi
Rishi

Reputation: 1183

Sorting basing on the value is not supported by Spark. As a work around you could swap key and value pairs and then sort basing on the key.

check this : https://issues.apache.org/jira/browse/SPARK-3655

Swap key and value using this code :

JavaPairRDD<Integer, String> swapped = counts.mapToPair(new PairFunction<Tuple2<String, Integer>, Integer, String>() {
           @Override
           public Tuple2<Integer, String> call(Tuple2<String, Integer> item) throws Exception {
               return item.swap();
           }

        });

Upvotes: 4

Related Questions