Reputation: 697
I ran into a problem using spark dataset! I keep getting the exception about encoders when I want to use case class the code is a simple one below:
case class OrderDataType (orderId: String, customerId: String, orderDate: String)
import spark.implicits._
val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[OrderDataType]
I get this exception during compile:
Unable to find encoder for type OrderDataType. An implicit Encoder[OrderDataType] is needed to store OrderDataType instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
I have already added this: import spark.implicits._ but it doesn't solve the problem!
According to spark and scala documentation, the encoding must be done implicitly with scala!
What is wrong with this code and what should I do to fix it!
Upvotes: 2
Views: 435
Reputation: 29155
Other way is ... you can use every thing inside object Orders extends App
(intelligent enough to identify case class from out side of def main)
mydata/Orders.csv
orderId,customerId,orderDate
1,2,21/08/1977
1,2,21/08/1978
Example code :
package examples
import org.apache.log4j.Level
import org.apache.spark.sql._
object Orders extends App {
val logger = org.apache.log4j.Logger.getLogger("org")
logger.setLevel(Level.WARN)
val spark = SparkSession.builder.appName(getClass.getName)
.master("local[*]").getOrCreate
case class OrderDataType(orderId: String, customerId: String, orderDate: String)
import spark.implicits._
val ds1 = spark.read.option("header", "true").csv("mydata/Orders.csv").as[OrderDataType]
ds1.show
}
Result :
+-------+----------+----------+
|orderId|customerId| orderDate|
+-------+----------+----------+
| 1| 2|21/08/1977|
| 1| 2|21/08/1978|
+-------+----------+----------+
Why case class outside of def main ....
Seems like this is by design of the Encoder
from annotation
@implicitNotFound
below
Upvotes: 1
Reputation: 31460
Define your case class outside of main
method then in main method read the csv file and convert to dataset
.
Example:
case class OrderDataType (orderId: String, customerId: String, orderDate: String)
def main(args: Array[String]): Unit = {
val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[OrderDataType]
}
//or
def main(args: Array[String]): Unit = {
val ds = spark.read.option("header", "true").csv("data\\orders.csv").as[(String,String,String)]
}
Upvotes: 2