Reputation: 453
I am trying to define schema to CSV file using case class
in Scala.
case class userSchema(name : String,
place : String,
designation : String)
object userProcess {
val spark = SparkSession.builder().appName("Spark_processing for Hbase").master("yarn").getOrCreate()
import spark.implicits._
val colNames = classOf[userSchema].getDeclaredFields.map(f=> f.getName)
val file = spark.read.option("inferSchema", false).option("header", false).csv("D:\\wSapce\\User.csv").toDF(colNames:_*).as(userSchema)
}
But in last line(for value file) I am getting below compile time error:
overloaded method value as with alternatives: (alias: Symbol)org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and> (alias: String)org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and> [U](implicit evidence$2: org.apache.spark.sql.Encoder[U])org.apache.spark.sql.Dataset[U] cannot be applied to (tavant.user.userSchema.type)
Any Idea why I am getting this error...?
Upvotes: 0
Views: 218
Reputation: 1912
The issue lies in the below line:
val file = spark.read.option("inferSchema", false).option("header", false).csv("D:\\wSapce\\User.csv").toDF(colNames:_*).as(userSchema)
spark.read.option().csv - will return the DataFrame. You don't need toDF() again, to convert to DataFrame.
You can convert the DataFrame to Dataset with defined scehma(case class) using as(userSchema) method as below:
val file = spark.read.option("inferSchema", false).option("header", false).csv("D:\\wSapce\\User.csv").as(userSchema)
Upvotes: 2