Reputation: 800
Hi I am using Spark java apis to fetch data from hive. This code is working in hadoop single node cluster. But when I tried to use it in hadoop multi node cluster it throws error as
org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't running on a cluster. Deployment to YARN is not supported directly by SparkContext. Please use spark-submit.
Note : I have used master as local for single node and yarn-cluster for multi node.
And this is my java code
SparkConf sparkConf = new SparkConf().setAppName("Hive").setMaster("yarn-cluster");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
HiveContext sqlContext = new HiveContext(ctx.sc());
org.apache.spark.sql.Row[] result = sqlContext.sql("Select * from Tablename").collect();
Also I have tried to change master as local and now it throws unknown hostname exception.
Can anyone help me in this?
Updated
Error logs
15/08/05 11:30:25 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
15/08/05 11:30:25 INFO ObjectStore: Initialized ObjectStore
15/08/05 11:30:25 INFO HiveMetaStore: Added admin role in metastore
15/08/05 11:30:25 INFO HiveMetaStore: Added public role in metastore
15/08/05 11:30:25 INFO HiveMetaStore: No user is added in admin role, since config is empty
15/08/05 11:30:25 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/08/05 11:30:25 INFO HiveMetaStore: 0: get_table : db=default tbl=activity
15/08/05 11:30:25 INFO audit: ugi=labuser ip=unknown-ip-addr cmd=get_table : db=default tbl=activity
15/08/05 11:30:25 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
15/08/05 11:30:25 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/08/05 11:30:26 INFO MemoryStore: ensureFreeSpace(399000) called with curMem=0, maxMem=1030823608
15/08/05 11:30:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 389.6 KB, free 982.7 MB)
15/08/05 11:30:26 INFO MemoryStore: ensureFreeSpace(34309) called with curMem=399000, maxMem=1030823608
15/08/05 11:30:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.5 KB, free 982.7 MB)
15/08/05 11:30:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.100.7:61775 (size: 33.5 KB, free: 983.0 MB)
15/08/05 11:30:26 INFO SparkContext: Created broadcast 0 from collect at Hive.java:29
Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoopcluster
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:258)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:602)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:547)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1783)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:885)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.collect(RDD.scala:884)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:105)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1255)
at com.Hive.main(Hive.java:29)
Caused by: java.net.UnknownHostException: hadoopcluster
... 44 more
Upvotes: 1
Views: 1184
Reputation: 13346
As the exception indicates, the yarn-cluster mode cannot be used directly from the SparkContext
. But you can run it on a standalone multi-node cluster using the SparkContext
. First you have to start your standalone spark cluster and then you set sparkConf.setMaster("spark://HOST:PORT")
where HOST:PORT
is the URL of your spark cluster. I hope this solves your problem.
Upvotes: 1