Rohith Uppala
Rohith Uppala

Reputation: 46

error in initSerDe : java.lang.ClassNotFoundException class org.apache.hive.hcatalog.data.JsonSerDe not found

I am trying to read data from the Hive table using spark sql (scala) and it is throwing me error as

ERROR hive.log: error in initSerDe: java.lang.ClassNotFoundException Class org.apache.hive.hcatalog.data.JsonSerDe not found
java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.data.JsonSerDe not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2255)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:392)
        at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:274)
        at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:256)
        at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:607)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$7.apply(HiveClientImpl.scala:358)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$7.apply(HiveClientImpl.scala:355)
        at scala.Option.map(Option.scala:146)

Hive table is stored as

ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.SequenceFileInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'

I added the /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar using :require /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar and able to see Added to classpath.

I also tried to add the JAR file in the SparkSession.config(). Both of them didn't worked. I checked some answers from stackoverflow which are not helped to resolve my issue.

CREATE EXTERNAL TABLE `test.record`(
  `test_id` string COMMENT 'from deserializer', 
  `test_name` string COMMENT 'from deserializer', 
  `processed_datetime` timestamp COMMENT 'from deserializer'
  )
PARTITIONED BY ( 
  `filedate` date)
ROW FORMAT SERDE 
  'org.apache.hive.hcatalog.data.JsonSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.SequenceFileInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'

I am expecting to read the data from the hive table and able to store it in Dataframe.

var tempDF =sql("SELECT * FROM test.record WHERE filedate = '2019-06-03' LIMIT 5")
tempDF.show()

should work

Upvotes: 2

Views: 8325

Answers (4)

Yulei Yang
Yulei Yang

Reputation: 1

This error occurs in remote hms, not your spark app. You should add hive-hcatalog-core jar to its classpath and restart it

Upvotes: 0

Dyno Fu
Dyno Fu

Reputation: 9044

on emr, these jars are

  • /usr/lib/hive/lib/hive-hcatalog-core.jar
  • /usr/lib/hive/lib/hive-serde.jar
  • /usr/lib/hive/lib/hive-common.jar
$ rpm -qf /usr/lib/hive/lib/hive-serde.jar
hive-3.1.3.amzn.1-1.amzn2.noarch

pass them to --jars should solve the ClassNotFoundException in spark.sql.

Upvotes: 0

user2846168
user2846168

Reputation: 41

Add the required jar as follows rather than copying the jar on all the nodes in the cluster:

  1. in conf/spark-defaults add following configs

    spark.driver.extraClassPath /fullpath/hive-hcatalog-core-3.1.2.jar spark.executor.extraClassPath /fullpath/hive-hcatalog-core-3.1.2.jar

  2. Or in spark-sql, execute add jar statement before querying: ADD JAR /fullpath/hive-hcatalog-core-3.1.2.jar

Upvotes: 0

elprup
elprup

Reputation: 1998

One quick way to solve this is copy jar file to spark.

source file is from hive lib directory, hive-hcatalog-core-3.1.2.jar, copy it to jars under spark directory.

I also tried to modify hive.aux.jars.path config in hive-site.xml, but it doesn't work. If anyone knows there's configuration for spark to load extra jars, please comment.

Upvotes: 0

Related Questions