\n\n
the reason is very likely miss some hbase jars , because during spark runing , spark need through hbase jar to read data , if not exist , so some exception will throws , what should you do ? it is easy
before submit job , you need add params --jars and join some jar in follows:
\n\n--jars \n/ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
\n/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
\n/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
\n/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
\n/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
\n/ROOT/server/hbase/lib/guava-12.0.1.jar,
\n/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
\n/ROOT/server/hbase/lib/htrace-core-2.04.jar
if can , enjoy it !
\n","author":{"@type":"Person","name":"Qin Dong Liang"},"upvoteCount":6}}}Reputation: 65
I am trying to use jersey Rest-API to fetch the records from HBASE table through java-Spark program then I am getting the below mentioned error however when I am accessing the HBase-table through spark-Jar then code is executing without errors.
I have a 2 worker node for Hbase and 2 worker node for spark which are maintained by same Master.
WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.31.16.140): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Upvotes: 6
Views: 7929
Reputation: 3086
CDP/CDH:
Step1: Copy the hbase-site.xml file into /etc/spark/conf/ directory. cp /opt/cloudera/parcels/CDH/lib/hbase/conf/hbase-site.xml /etc/spark/conf/
Step2: Add the following libraries to spark-submit/spark-shell.
/opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar
/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar
/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar
Spark-shell:
sudo -u hive spark-shell --master yarn --jars /opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar,/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar --files /etc/spark/conf/hbase-site.xml
Upvotes: 0
Reputation: 21
I've met the same problem in CDH5.4.0 when submit the spark job implemented with java api, here are my solutions:
solution 1:Using spark-submit:
--jars zookeeper-3.4.5-cdh5.4.0.jar,
hbase-client-1.0.0-cdh5.4.0.jar,
hbase-common-1.0.0-cdh5.4.0.jar,
hbase-server1.0.0-cdh5.4.0.jar,
hbase-protocol1.0.0-cdh5.4.0.jar,
htrace-core-3.1.0-incubating.jar,
// custom jars which are needed in the spark executors
solution 2:Use SparkConf in code:
SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar",
"hbase-client-1.0.0-cdh5.4.0.jar",
"hbase-common-1.0.0-cdh5.4.0.jar",
"hbase-server1.0.0-cdh5.4.0.jar",
"hbase-protocol1.0.0-cdh5.4.0.jar",
"htrace-core-3.1.0-incubating.jar",
// custom jars which are needed in the spark executors
});
To summary
the problem is caused by missing jars in the spark project, you need to added these jars to your project classpath, besides, use the above 2 solutions to help distribute these jars to your spark cluster.
Upvotes: 0
Reputation: 435
ok, i may be know your problem , because i have just experienced .
the reason is very likely miss some hbase jars , because during spark runing , spark need through hbase jar to read data , if not exist , so some exception will throws , what should you do ? it is easy
before submit job , you need add params --jars and join some jar in follows:
--jars
/ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/guava-12.0.1.jar,
/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/htrace-core-2.04.jar
if can , enjoy it !
Upvotes: 6