Getting error while defining schema for csv file in spark using scala

Question

I am trying to define schema to CSV file using case class in Scala.

case class userSchema(name : String,
                      place : String,
                      designation : String)
object userProcess {
  val spark = SparkSession.builder().appName("Spark_processing for Hbase").master("yarn").getOrCreate()
  import spark.implicits._
  val colNames = classOf[userSchema].getDeclaredFields.map(f=> f.getName)
    val file = spark.read.option("inferSchema", false).option("header", false).csv("D:\wSapce\User.csv").toDF(colNames:_*).as(userSchema)

}

But in last line(for value file) I am getting below compile time error:

overloaded method value as with alternatives: (alias: Symbol)org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]  (alias: String)org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]  [U](implicit evidence$2: org.apache.spark.sql.Encoder[U])org.apache.spark.sql.Dataset[U] cannot be applied to (tavant.user.userSchema.type)

Any Idea why I am getting this error...?

Lakshman Battini · Accepted Answer

The issue lies in the below line:

val file = spark.read.option("inferSchema", false).option("header", false).csv("D:\wSapce\User.csv").toDF(colNames:_*).as(userSchema)

spark.read.option().csv - will return the DataFrame. You don't need toDF() again, to convert to DataFrame.

You can convert the DataFrame to Dataset with defined scehma(case class) using as(userSchema) method as below:

val file = spark.read.option("inferSchema", false).option("header", false).csv("D:\wSapce\User.csv").as(userSchema)

Getting error while defining schema for csv file in spark using scala

Answers (1)

Related Questions