Reputation: 77
1) for Categories
twitter handle , categories , sub_categories
handle , Products , MakeUp
handle , Health, MakeUp
handle2 , Services , Face
handle3 , Marketing , Soap
JavaPairRDD<String ,Category> categoryPairRDD
2) For Twitter
Twitter handle , twitter_post , twitter_likes
handle "Iphone" , 10
handle2 "Samsung" ,20
JavaPairRDD<String ,Twitter> twitterPairRDD
JavaPairRDD<String, Tuple2<Iterable<Ontologies>, Iterable<Twitter>>> grouped = categoryPairRDD
.cogroup(twitterPairRDD);
How should I iterate the cogroup values such that If for a Key if the object is found print the values, else print null values
i.e. In my categoryPairRDD handle3 is present but its absent in twitterRDD so out put for key handle3 should be
handle3 , Marketing , Soap , null , null
Final out put should be
handle , Products , Makeup , Iphone , 10
handle , Health , Makeup , , Iphone, 10
handle2 , Services , Face , Samsung , 20
handle3 , Marketing, Soap , null , null
Upvotes: 1
Views: 417
Reputation: 77
Managed to get a solution
JavaPairRDD<String, Tuple2<Ontologies, Optional<twitterPairRDD>>> left = ontologiesPair.leftOuterJoin(twitterPairRDD);
left.foreach(new VoidFunction<Tuple2<String,Tuple2<Ontologies,Optional<Twitter>>>>() {
@Override
public void call(Tuple2<String, Tuple2<Ontologies, Optional<Instagram>>> arg0) throws Exception {
try{
Optional<Twitter> tweet = arg0._2._2();
//print values from tuple ie arg0._2._1() and tweet object
}
catch(Exception e){
Twitter tweet = new Twitter("",-1);
//Print values from arg0._2._1() and empty tweet object
}
But still I would like to know any answer using co-group
Upvotes: 1