SrtoPeixet
SrtoPeixet

Reputation: 29

JavaPairRDD convert key-value into key-list

I have a JavaPairRDD containing (Key, Value) which I want to group by Keys and make the "second column" a list with all values seen for that key. I am currently using the groupby() function, which does the key aggrupation correctly but converts my values to an Iterable of Long. This is,

Key1 Iterable<Long>
Key2 Iterable<Long>
...

Is there any way to force this function to use a List of Longs instead of an Iterable object?

Key1 List<Long>
Key2 List<Long>
...

I read something about a function called combineByKey() but I think this is not a use case. Probably I need to use reduceByKey but I am not seeing it. It should be something like this:

myRDD.reduceByKey((a,b) -> new ArrayList<Long>()) //and add b to a 

In the end, I want to combine values to obtain a Key n, List<Long> RDD. Thank you for your time.

Upvotes: 1

Views: 458

Answers (1)

blackbishop
blackbishop

Reputation: 32720

You can try something like this:

JavaPairRDD <String, List<long>> keyValuePairs = rdd.map(t -> {
    return new Tuple2(t._1, Arrays.asList(new long[]{t._2}));
}).reduceByKey((a, b) -> {
    a.addAll(b);
    return a;
});

First, you map to convert the value into a list of longs. Then reduceByKey and combine the lists using addAll method on arraylist.

Upvotes: 1

Related Questions