Getting AnalysisException on dataset

Question

I am trying to read CSV file in scala using dataset. And after that I am performing some operation. But my code is throwing error. Below is my code:

 final case class AadharData(date:String,
                                    registrar:String,
                                    agency:String,
                                    state:String,
                                    district:String,
                                    subDistrict:String,
                                    pinCode:Int,
                                    gender:String,
                                    age:Int,
                                    aadharGenerated:Int,
                                    rejected:Int,
                                    mobileNo:Double,
                                    email:String)

     val spark = SparkSession.builder().appName("GDP").master("local").getOrCreate()
     import spark.implicits._
     val a = spark.read.option("header", false).csv("D:\BGH\Spark\aadhaar_data.csv").as[AadharData]
     val b = a.map(rec=>{
          (rec.registrar,1)
        }).groupByKey(f=>f._1).collect()

And I am getting below error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`date`' given input columns: [_c0, _c2, _c1, _c3, _c5, _c8, _c9, _c7, _c6, _c11, _c12, _c10, _c4];

Any help is appreciated: Thanks in advance.

Ramesh Maharjan · Accepted Answer

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'date' given input columns: [_c0, _c2, _c1, _c3, _c5, _c8, _c9, _c7, _c6, _c11, _c12, _c10, _c4];

The above error is because you have used header option as false (.option("header", false)) so spark generates column names as _c0, _c1 and so on.. But while typecasting the generated dataframe using a case class you used column names different than the ones already generated. Thus the above error happened.

Solution

You should tell spark sql to generate the names used in the case class and also tell it to inferschema too as

val columnNames = classOf[AadharData].getDeclaredFields.map(x => x.getName)
val a = sqlContext.read.option("header", false).option("inferSchema", true)
  .csv("D:\BGH\Spark\aadhaar_data.csv").toDF(columnNames:_*).as[AadharData]

The above error should go away

Getting AnalysisException on dataset

Answers (1)

Related Questions