Reputation: 3590
I have an RDD, i need to convert it into a Dataset, i tried:
Dataset<Person> personDS = sqlContext.createDataset(personRDD, Encoders.bean(Person.class));
the above line throws the error,
cannot resolve method createDataset(org.apache.spark.api.java.JavaRDD Main.Person, org.apache.spark.sql.Encoder T)
however, i can convert to Dataset
after converting to Dataframe
. the below code works:
Dataset<Row> personDF = sqlContext.createDataFrame(personRDD, Person.class);
Dataset<Person> personDS = personDF.as(Encoders.bean(Person.class));
Upvotes: 10
Views: 23539
Reputation: 454
StructType schema = new StructType()
.add("Id", DataTypes.StringType)
.add("Name", DataTypes.StringType)
.add("Country", DataTypes.StringType);
Dataset<Row> dataSet = sqlContext.createDataFrame(yourJavaRDD, schema);
Be carefull with schema variable, not always easy to predict what datatype you need to use, sometimes it's better to use just StringType for all columns
Upvotes: -1
Reputation: 3633
In addition to accepted answer, if you want to create a Dataset<Row>
instead of Dataset<Person>
in Java, please try like this:
StructType yourStruct = ...; //Create your own structtype based on individual field types
Dataset<Row> personDS = sqlContext.createDataset(personRDD.rdd(), RowEncoder.apply(yourStruct));
Upvotes: 1
Reputation: 3590
.createDataset()
accepts RDD<T>
not JavaRDD<T>
. JavaRDD
is a wrapper around RDD inorder to make calls from java code easier. It contains RDD internally and can be accessed using .rdd()
. The following can create a Dataset
:
Dataset<Person> personDS = sqlContext.createDataset(personRDD.rdd(), Encoders.bean(Person.class));
Upvotes: 18
Reputation: 2853
on your rdd use .toDS()
you will get a dataset.
Let me know if it helps. Cheers.
Upvotes: 1