Reputation: 69
I've always wandered if having a Dataset of a parameterised/generic class is possible in Java. To be more clear, what I am looking to achieve is something like this:
Dataset<MyClass<Integer>> myClassInteger;
Dataset<MyClass<String>> myClassString;
Please let me know if this is possible. If you could also show me how to achieve this, I would be very appreciative. Thanks!
Upvotes: 2
Views: 842
Reputation: 1276
Sorry this question is old, but I wanted to put some notes down since I was able to work with generic/parameterized classes for Datasets in java by creating a generic class that took a type parameter, and subsequently put methods inside that parameterized class. Ie, class MyClassProcessor<T1>
where T1 could be Integer
or String
.
Unfortunately, you will not enjoy full benefits of generic types in this case, and you will have to perform some workarounds:
Encoders.kryo()
, otherwise the generic types became Object
with some operations and could not be cast correctly to the generic type.
map
. For example, I read TypeA
and later worked with Dataset<MyClass>.TypeA.class
and using raw Types for certain map functions etc...Upvotes: 1
Reputation: 1352
Yes, you can have Dataset of your own class. It Would look like Dataset<MyOwnClass>
In the code below I have tried to read a file content and put it in the Dataset of the class that we have created. Please check the snippet below.
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;
import java.io.Serializable;
public class FileDataset {
public static class Employee implements Serializable {
public int key;
public int value;
}
public static void main(String[] args) {
// configure spark
SparkSession spark = SparkSession
.builder()
.appName("Reading JSON File into DataSet")
.master("local[2]")
.getOrCreate();
final Encoder<Employee> employeeEncoder = Encoders.bean(Employee.class);
final String jsonPath = "/Users/ajaychoudhary/Documents/student.txt";
// read JSON file to Dataset
Dataset<Employee> ds = spark.read()
.json(jsonPath)
.as(employeeEncoder);
ds.show();
}
}
The content of my student.txt
file is
{ "key": 1, "value": 2 }
{ "key": 3, "value": 4 }
{ "key": 5, "value": 6 }
It produces the following output on the console:
+---+-----+
|key|value|
+---+-----+
| 1| 2|
| 3| 4|
| 5| 6|
+---+-----+
I hope this gives you an initial idea of how you can have the dataset of your own custom class.
Upvotes: -1