Reputation: 75137
I have an RDD:
JavaPairRDD<Long, ViewRecord> myRDD
which is created via newAPIHadoopRDD
method. I have an existed map function which I want to implement it in Spark way:
LongWritable one = new LongWritable(1L);
protected void map(Long key, ViewRecord viewRecord, Context context)
throws IOException ,InterruptedException {
String url = viewRecord.getUrl();
long day = viewRecord.getDay();
tuple.getKey().set(url);
tuple.getValue().set(day);
context.write(tuple, one);
};
PS: tuple is derived from:
KeyValueWritable<Text, LongWritable>
and can be found here: TextLong.java
Upvotes: 1
Views: 193
Reputation: 1819
I don't know what tuple is but if you just want to map record to tuple with key (url, day)
and value 1L
you can do it like this:
result = myRDD
.values()
.mapToPair(viewRecord -> {
String url = viewRecord.getUrl();
long day = viewRecord.getDay();
return new Tuple2<>(new Tuple2<>(url, day), 1L);
})
//java 7 style
JavaPairRDD<Pair, Long> result = myRDD
.values()
.mapToPair(new PairFunction<ViewRecord, Pair, Long>() {
@Override
public Tuple2<Pair, Long> call(ViewRecord record) throws Exception {
String url = record.getUrl();
Long day = record.getDay();
return new Tuple2<>(new Pair(url, day), 1L);
}
}
);
Upvotes: 2