Reputation: 111
I have List of Key,value pairs such as List((A,1),(B,2),(C,3)) in heap memory. How can I parallelize this list to create a JavaPairRDD? In scala : val pairs = sc.parallelize(List((A,1),(B,2),(C,3)). Likewise, Is there any way with java API?
Upvotes: 0
Views: 5065
Reputation: 189
I can see this one working for me
sc.parallelizePairs(Arrays.asList(new Tuple2("123","123")));
Upvotes: 1
Reputation: 1932
Convert Tuple into List with below code snippet.
Tuple2<Sensor, Integer> tuple = new Tuple2<Sensor, Integer>(arg0._2, 1);
List<Tuple2<Sensor, Integer>> list = new ArrayList<Tuple2<Sensor, Integer>>();
list.add(tuple);
Upvotes: 0
Reputation: 111
I found the answer. First store the List of tuples in JavaRDD and then convert it to JavaPairRDD.
List<Tuple2> data = Arrays.asList(new Tuple2("panda", 0),new Tuple2("panda", 1));
JavaRDD rdd = sc.parallelize(data);
JavaPairRDD pairRdd = JavaPairRDD.fromJavaRDD(rdd);
Have a look at this answer
Upvotes: 2
Reputation: 3890
Parallelized collections are created by calling JavaSparkContext’s parallelize method on an existing Collection in your driver program. The elements of the collection are copied to form a distributed dataset that can be operated on in parallel.
List data = ......;
JavaRDD rdd = sc.parallelize(data);
Upvotes: 0