Reputation: 1040
I have a JavaPairRDD<String, List<Tuple2<Integer, Integer>>>
named rddA
. For example (after collecting rddA
): [(word1,[(187,267), (224,311), (187,110)]), (word2,[(187,200), (10,90)])]
. Thus, for example, word1
is the key and value is [(187,267), (224,311), (187,110)])]
.
How can I define the corresponding JavaPairRDD<Integer, List<Integer>>
to get the following ouptput:
[(187, [267, 110, 200]), (224,[311]), (10,[90])]
So, the obtained JavaPairRDD
includes three keys: 187, 224
and 10
. And for example, the key 187
has [267, 110, 200]
as a list value.
Upvotes: -3
Views: 1965
Reputation: 10406
You simply need to flatten the list of tuples (second value of your tuple) and group by the first element of the tuple.
JavaPairRDD<Integer, List<Integer>> result = rddA
.flatMapValues(x -> x) // flattening the list
.mapToPair(x -> x._2) // getting rid of the first key
.groupByKey()
.mapValues(x -> { // turning the iterable into a list
List<Integer> list = new ArrayList<>();
x.forEach(list::add);
return list;
});
Upvotes: 1