Reputation: 453
I am trying to read CSV file in scala using dataset. And after that I am performing some operation. But my code is throwing error. Below is my code:
final case class AadharData(date:String,
registrar:String,
agency:String,
state:String,
district:String,
subDistrict:String,
pinCode:Int,
gender:String,
age:Int,
aadharGenerated:Int,
rejected:Int,
mobileNo:Double,
email:String)
val spark = SparkSession.builder().appName("GDP").master("local").getOrCreate()
import spark.implicits._
val a = spark.read.option("header", false).csv("D:\\BGH\\Spark\\aadhaar_data.csv").as[AadharData]
val b = a.map(rec=>{
(rec.registrar,1)
}).groupByKey(f=>f._1).collect()
And I am getting below error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`date`' given input columns: [_c0, _c2, _c1, _c3, _c5, _c8, _c9, _c7, _c6, _c11, _c12, _c10, _c4];
Any help is appreciated: Thanks in advance.
Upvotes: 0
Views: 2613
Reputation: 41957
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '
date
' given input columns: [_c0, _c2, _c1, _c3, _c5, _c8, _c9, _c7, _c6, _c11, _c12, _c10, _c4];
The above error is because you have used header option as false (.option("header", false)
) so spark generates column names as _c0, _c1 and so on.. But while typecasting the generated dataframe using a case class you used column names different than the ones already generated. Thus the above error happened.
Solution
You should tell spark sql to generate the names used in the case class and also tell it to inferschema too as
val columnNames = classOf[AadharData].getDeclaredFields.map(x => x.getName)
val a = sqlContext.read.option("header", false).option("inferSchema", true)
.csv("D:\\BGH\\Spark\\aadhaar_data.csv").toDF(columnNames:_*).as[AadharData]
The above error should go away
Upvotes: 5