Reputation: 641
I have a JavaPairRDD < String, Iterable < Tuple2 < String, String>>>
I printed it in a file and the content is
(ABC,[(ABC,1)])
(BBC,[(BBC,1)])
(CBD,[(CBD,1)])
(BBD,[(BBD,1)])
(ACD,[(ACD,1)])
Now I want to take only the strings ABC, BBC, CBD, BBD, ACD to a JavaRDD and print them in a file
Till now I am able to print them in a console using foreach
foreach(new VoidFunction<Tuple2<String, Iterable<Tuple2<String, String>>>>() {
@Override
public void call(Tuple2<String, Iterable<Tuple2<String, String>>> t) throws Exception {
// TODO Auto-generated method stub
System.out.println(t._1);
}
});
I want to do the same in a file. I am new to spark and so don't know how I could acheive this. Any help would be much appreciated. Thanks in advance.
Upvotes: 0
Views: 1506
Reputation: 564
Please, try:
pairRdd.keys().coalesce(1).saveAsTextFile("some_path");
Upvotes: 0