Reputation: 151
I am very new to Apache Spark. I am trying to create a JavaPairRdd
from HashMap
. I have a HashMap
of type <String,<Integer,Integer>>
How can I convert it into a JavaPairRdd
? I have pasted my code below:
HashMap<String, HashMap<Integer,String>> canlist =
new HashMap<String, HashMap<Integer,String>>();
for (String key : entityKey)
{
HashMap<Integer, String> clkey = new HashMap<Integer, String>();
int f=0;
for (String val :mentionKey)
{
//do something
simiscore = (longerLength - costs[m.length()]) / (double) longerLength;
if (simiscore > 0.6) {
clkey.put(v1,val);
System.out.print(
" The mention " + val + " added to link entity " + key);
}
f++;
System.out.println("Scan Completed");
}
canlist.put(key,clkey);
JavaPairRDD<String, HashMap<Integer, String>> rad;
rad = context.parallelize(scala.collection.Seq(toScalaMap(canlist)));
}
public static <String,Object> Map<String,Object> toScalaMap(HashMap<String,Object> m) {
return (Map<String,Object>) JavaConverters.mapAsScalaMapConverter(m).asScala().toMap(
Predef.<Tuple2<String,Object>>conforms()
);}
}
Upvotes: 3
Views: 10221
Reputation: 201
Code snippet of the generic method for conversion. Utilize JavaSparkContext.parallelizePairs()
with the result of this method.
//fromMapToListTuple2() generic method to convert Map<T1, T2> to List<Tuple2<T1, T2>>
public static <T1, T2> List<Tuple2<T1, T2>> fromMapToListTuple2(Map<T1, T2> map)
{
//list of tuples
List<Tuple2<T1, T2>> list = new ArrayList<Tuple2<T1, T2>>();
//loop through all key-value pairs add them to the list
for(T1 key : map.keySet())
{
//get the value
T2 value = map.get(key);
//Tuple2 is not like a traditional Java collection, but a single k-v pair;
Tuple2<T1, T2> tuple2 = new Tuple2<T1, T2>(key, value);
//populate the list with created tupple2
list.add(tuple2);
} // for
return list;
} // fromMapToListTuple2
Upvotes: 0
Reputation: 15297
Here is another way to convert java HashMap<String, HashMap<Integer,String>>
to List<Tuple2<String, HashMap<Integer,String>>>
and pass to parallelizePairs() method of JavaSparkContext.
import scala.Tuple2;
List<Tuple2<String, HashMap<Integer,String>>> list = new ArrayList<Tuple2<String, HashMap<Integer,String>>>();
for(Map.Entry<String, HashMap<Integer,String>> entry : canlist.entrySet()){
list1.add(new Tuple2<String, HashMap<Integer,String>>(entry.getKey(),entry.getValue()));
}
JavaPairRDD<String, HashMap<Integer, String>> javaPairRdd = jsc.parallelizePairs(list);
Upvotes: 1
Reputation: 27455
If you convert the HashMap
into a List<scala.Tuple2<Integer, String>>
, then you can use JavaSparkContext.parallelizePairs
.
Upvotes: 9