Reputation: 1
To elaborate on what I'm stuck on or unsure of how to approach, I currently have a JavaPairRDD "media" that contains two integer values, an followed id and a follower id. What I'm trying to do is count the number of times the key integer (followed id) appears in "media". The problem is that each key's value is another id and not simply the value 1. That being said, what I have attempted to do is use .mapToPair and extracting the key value to create a new tuple2<>(p._1, 1), where each key will now hold the value 1 to make the process of counting easier. I then follow this up using reduceByKey(), but I keep getting an error and I'm not too sure how to return the new JavaPairRDD as <id, count>. Attached is some code I have written up to this point:
JavaPairRDD<Integer, Integer> socials =
media.mapToPair(p -> new Tuple2<>(p._1, 1))
.reduceByKey(p2 -> p._1 + p._2);
Upvotes: 0
Views: 299
Reputation: 1601
Let's say you have these RDD<Int,Int>
tuples:
(4,5)
(1,7)
(1,3)
(3,4)
(2,3)
(1,2)
From what I understood, you want to count how many times key
is repeated, therefore the result should be something like this:
1, 3
2, 1
3, 1
4, 1
If this is what you want, you can do it through:
media.map(x => x._1).countByValue()
Good luck!
Upvotes: 0