Gyanendra Dwivedi
Gyanendra Dwivedi

Reputation: 5538

Spark Java API: How to convert JavaRDD to RDD type

I am trying to convert code written using Dataframe into DataSet API. The problem is that I have JavaRDD created as :

final JavaRDD<String> abcJavaRdd= jsc.textFile("/path/to/textfile");

But the createDataset method of sqlContext class expect RDD<T> type rather than JavaRDD<T> type.

SQLContext sqlc = new SQLContext(jsc);
....
....
Encoder<Abc> abcEncoder= Encoders.bean(Abc.class);
Dataset<Abc> abcDataset= sqlc.createDataset(abcJavaRdd, abcEncoder);

The last line in the above code does not work. I want to know how to create 'org.apache.spark.rdd.RDD' type from 'org.apache.spark.api.java.JavaRDD' type?

I am using Java 1.8 with apache spark 1.6.1 on mapr cluster.

Upvotes: 2

Views: 2314

Answers (1)

Gyanendra Dwivedi
Gyanendra Dwivedi

Reputation: 5538

After digging through API, I found the answer.

The org.apache.spark.api.java.JavaRDD class exposes a static method to convert a JavaRDD type of object into org.apache.spark.rdd.RDD which is accepted by createDataset method of SQLContext class.

Encoder<Abc> abcEncoder= Encoders.bean(Abc.class);
Dataset<Abc> abcDataset= sqlc.createDataset(JavaRDD.toRDD(abcJavaRdd), abcEncoder);

Another way is to call rdd() method on abcJavaRdd. i.e. abcJavaRdd.rdd()

Upvotes: 2

Related Questions