Reputation: 248
I run spark to write data to hbase, but found NoSuchMethodException:
15/10/23 18:45:21 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, dn18-formal.i.nease.net): java.lang.NoSuchMethodError: com.google.common.io.ByteStreams.limit(Ljava/io/InputStream;J)Ljava/io/InputStream;
I found guava.jar
in hadoop/hbase
dir and the version is 12.0, but com.google.common.io.ByteStreams.limit
is since 14.0, so NoSuchMethodException
occurs.
I try to run spark-submmit by - -jars
,but the same. and I try to add
configuration.set("spark.executor.extraClassPath", "/home/ljh")
configuration.set("spark.driver.userClassPathFirst","true");
to my code, still the same.
How to solve this? How to remove the guava.jar
in hadoop/hbase
from class path? why it does not use the guava.jar
in spark dir.
Here is my code:
rdd.foreach({ res =>
val configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.property.clientPort", "2181");
configuration.set("hbase.zookeeper.quorum", “ip.66");
configuration.set("hbase.master", “ip:60000");
configuration.set("spark.executor.extraClassPath", "/home/ljh")
configuration.set("spark.driver.userClassPathFirst","true");
val hadmin = new HBaseAdmin(configuration);
configuration.clear();
configuration.addResource("/home/hadoop/conf/core-default.xml")
configuration.addResource("/home/hadoop/conf/core-site.xml")
configuration.addResource("/home/hadoop/conf/mapred-default.xml")
configuration.addResource("/home/hadoop/conf/mapred-site.xml")
configuration.addResource("/home/hadoop/conf/yarn-default.xml")
configuration.addResource("/home/hadoop/conf/yarn-site.xml")
configuration.addResource("/home/hadoop/conf/hdfs-default.xml")
configuration.addResource("/home/hadoop/conf/hdfs-site.xml")
configuration.addResource("/home/hadoop/conf/hbase-default.xml")
configuration.addResource("/home/ljhn1829/hbase-site.xml")
val table = new HTable(configuration, "ljh_test2");
var put = new Put(Bytes.toBytes(res.toKey()));
put.add(Bytes.toBytes("basic"), Bytes.toBytes("name"), Bytes.toBytes(res.totalCount + "\t" + res.positiveCount));
table.put(put);
table.flushCommits()
})
and the error message:
15/10/23 19:06:42 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, gdc-dn126-formal.i.nease.net): java.lang.NoSuchMethodError: com.google.common.io.ByteStreams.limit(Ljava/io/InputStream;J)Ljava/io/InputStream; at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.nextBatchStream(ExternalAppendOnlyMap.scala:420) at org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.(ExternalAppendOnlyMap.scala:392) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:207) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:63) at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:83) at org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:63) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60) at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46) at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
15/10/23 19:06:42 INFO TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, gdc-dn166-formal.i.nease.net, PROCESS_LOCAL, 1277 bytes) 15/10/23 19:06:42 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on gdc-dn166-formal.i.nease.net:3838 (size: 3.2 KB, free: 1060.3 MB) 15/10/23 19:06:42 ERROR YarnScheduler: Lost executor 1 on gdc-dn126-formal.i.nease.net: remote Rpc client disassociated 15/10/23 19:06:42 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://[email protected]:1656] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/10/23 19:06:42 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 1.0 15/10/23 19:06:42 INFO DAGScheduler: Executor lost: 1 (epoch 1) 15/10/23 19:06:42 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 15/10/23 19:06:42 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, gdc-dn126-formal.i.nease.net, 44635) 15/10/23 19:06:42 INFO BlockManagerMaster: Removed 1 successfully in removeExecutor 15/10/23 19:06:42 INFO ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 1 (0/1, false) 15/10/23 19:06:42 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to gdc-dn166-formal.i.nease.net:28595 15/10/23 19:06:42 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 84 bytes 15/10/23 19:06:42 WARN TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2, gdc-dn166-formal.i.nease.net): FetchFailed(null, shuffleId=1, mapId=-1, reduceId=0, message= org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 1 at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:389) at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:386) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:385) at org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:172) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:42) at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40) at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Upvotes: 1
Views: 1659
Reputation: 1
add
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>14.0.1</version>
</dependency>
because in https://guava.dev/releases/19.0/api/docs/src-html/com/google/common/io/ByteStreams.html#line.596
587 /**
588 * Wraps a {@link InputStream}, limiting the number of bytes which can be
589 * read.
590 *
591 * @param in the input stream to be wrapped
592 * @param limit the maximum number of bytes to be read
593 * @return a length-limited {@link InputStream}
594 * @since 14.0 (since 1.0 as com.google.common.io.LimitInputStream)
595 */
596 public static InputStream limit(InputStream in, long limit) {
597 return new LimitedInputStream(in, limit);
598 }
Upvotes: 0