Sandeep Veerlapati
Sandeep Veerlapati

Reputation: 111

How to parallelize a list of key value pairs to JavaPairRDD in Apache spark Java API?

I have List of Key,value pairs such as List((A,1),(B,2),(C,3)) in heap memory. How can I parallelize this list to create a JavaPairRDD? In scala : val pairs = sc.parallelize(List((A,1),(B,2),(C,3)). Likewise, Is there any way with java API?

Upvotes: 0

Views: 5065

Answers (4)

pranaygoyal02
pranaygoyal02

Reputation: 189

I can see this one working for me

sc.parallelizePairs(Arrays.asList(new Tuple2("123","123")));

Upvotes: 1

Rajeev Rathor
Rajeev Rathor

Reputation: 1932

Convert Tuple into List with below code snippet.
Tuple2<Sensor, Integer> tuple = new Tuple2<Sensor, Integer>(arg0._2, 1);
                List<Tuple2<Sensor, Integer>> list = new ArrayList<Tuple2<Sensor, Integer>>();
                list.add(tuple);

Upvotes: 0

Sandeep Veerlapati
Sandeep Veerlapati

Reputation: 111

I found the answer. First store the List of tuples in JavaRDD and then convert it to JavaPairRDD.

    List<Tuple2> data =  Arrays.asList(new Tuple2("panda", 0),new Tuple2("panda", 1));
    JavaRDD rdd = sc.parallelize(data);
    JavaPairRDD pairRdd = JavaPairRDD.fromJavaRDD(rdd);

Have a look at this answer

Upvotes: 2

banjara
banjara

Reputation: 3890

Parallelized collections are created by calling JavaSparkContext’s parallelize method on an existing Collection in your driver program. The elements of the collection are copied to form a distributed dataset that can be operated on in parallel.

List data = ......;
JavaRDD rdd = sc.parallelize(data);

Upvotes: 0

Related Questions