Reputation: 2752
I am using Spark 1.3.0 with Hadoop/Yarn and I am having an error message which says
WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@virtm2:51482] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
I read about it and found that setting the akka heartbeat interval to 100 would solve this problem:
SparkConf conf = new SparkConf().setAppName("Name");
conf.set("spark.akka.heartbeat.interval", "100");
Unfortunately, it does not in my case. The jobs fails with this error as cause after a few seconds I hit enter.
I submit the job with this command:
/usr/local/spark130/bin/spark-submit
--class de.unidue.langTecspark.TweetTag
--master yarn-client
--executor-memory 2g
--driver-memory 4g
/home/huser/sparkIt-1.0-standalone.jar
The logs of the executing container on the nodes say the Application master got killed
5 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
I attempted to let a minimal example run, this one (it essential does nothing..just to see if it has the same problem.):
public static void main(String [] args){
SparkConf conf = new SparkConf().setAppName("Minimal");
JavaSparkContext sc = new JavaSparkContext(conf);
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> distData = sc.parallelize(data);
sc.close();
}
I get in the log again the Applicationmaster killed Error. Whatever is wrong here is not memory related, but I am having really difficulties to track this problem.
I have a mini-distributed setup with 4 machines for data/processing and 1 for the namenode.
Any help highly appreciated!
Upvotes: 3
Views: 1596
Reputation: 153
This problem can occur when master and slaves are not started properly. start master and slaves using ./sbin/start-all.sh
then submit your application.
Upvotes: 0