Reputation: 464
I have a JavaRDD when I print it my data looks like this [[String1,String2,String3],[String4],[String5,String6],[String7,String8,String9]]
Each String is in turn a pipe separated strings. I can split each string to form a key and value.
How can I convert this RDD to a JavaPairRDD?
Upvotes: 0
Views: 2863
Reputation: 1922
Follow below code snippet for transforming JavaRDD<K> into JavaPairRDD<K,V>
JavaPairRDD<Integer, Sensor> deviceRdd = sensorRdd.mapToPair(new PairFunction<Sensor, Integer, Sensor>() {
public Tuple2<Integer, Sensor> call(Sensor sensor) throws Exception {
Tuple2<Integer, Sensor> tuple = new Tuple2<Integer, Sensor>(Integer.parseInt(sensor.getsId().trim()), sensor);
return tuple;
}
});
Upvotes: 0
Reputation: 2995
Assuming you have such data in JavaRDD<List<String>>
:
List_0: ["sub10~sub11~sub12","sub20~sub21~sub22","sub30~sub31~sub32"]
List_1: ["sub40~sub41~sub42"]
Where ~
is the separator.
And you want to flat the lists and group the first and the third sub string with |
as the key for each input string, then store pairs in JavaPairRDD<String,String>
:
key: "sub10|sub12" value: "sub10~sub11~sub12"
You could achieve this by using flatMap
and then mapToPair
:
rdd.flatMap(new FlatMapFunction<List<String>,String>() {
public Iterable<String> call(List<String> li) throws Exception {
return li;
}
}).mapToPair(new PairFunction<String,String,String>() {
public Tuple2<String, String> call(String s) throws Exception {
String[] ss = s.split("~");
return new Tuple2<String,String>(ss[0] + "|" + ss[2], s);
}
});
Upvotes: 1