Reputation: 5538
I am trying to convert code written using Dataframe
into DataSet
API.
The problem is that I have JavaRDD created as :
final JavaRDD<String> abcJavaRdd= jsc.textFile("/path/to/textfile");
But the createDataset
method of sqlContext class expect RDD<T>
type rather than JavaRDD<T>
type.
SQLContext sqlc = new SQLContext(jsc);
....
....
Encoder<Abc> abcEncoder= Encoders.bean(Abc.class);
Dataset<Abc> abcDataset= sqlc.createDataset(abcJavaRdd, abcEncoder);
The last line in the above code does not work.
I want to know how to create 'org.apache.spark.rdd.RDD
' type from 'org.apache.spark.api.java.JavaRDD
' type?
I am using Java 1.8 with apache spark 1.6.1 on mapr cluster.
Upvotes: 2
Views: 2314
Reputation: 5538
After digging through API, I found the answer.
The org.apache.spark.api.java.JavaRDD
class exposes a static method to convert a JavaRDD
type of object into org.apache.spark.rdd.RDD
which is accepted by createDataset
method of SQLContext
class.
Encoder<Abc> abcEncoder= Encoders.bean(Abc.class);
Dataset<Abc> abcDataset= sqlc.createDataset(JavaRDD.toRDD(abcJavaRdd), abcEncoder);
Another way is to call rdd()
method on abcJavaRdd
. i.e. abcJavaRdd.rdd()
Upvotes: 2