toobee
toobee

Reputation: 2752

Spark: Association with remote system lost akka.tcp (disassociated)

I am using Spark 1.3.0 with Hadoop/Yarn and I am having an error message which says

WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@virtm2:51482] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].

I read about it and found that setting the akka heartbeat interval to 100 would solve this problem:

SparkConf conf = new SparkConf().setAppName("Name");
conf.set("spark.akka.heartbeat.interval", "100");

Unfortunately, it does not in my case. The jobs fails with this error as cause after a few seconds I hit enter.

I submit the job with this command:

/usr/local/spark130/bin/spark-submit 
--class de.unidue.langTecspark.TweetTag 
--master yarn-client 
--executor-memory 2g  
--driver-memory 4g 
/home/huser/sparkIt-1.0-standalone.jar

The logs of the executing container on the nodes say the Application master got killed

5 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM

I attempted to let a minimal example run, this one (it essential does nothing..just to see if it has the same problem.):

public static void main(String [] args){
        SparkConf conf = new SparkConf().setAppName("Minimal");
        JavaSparkContext sc = new JavaSparkContext(conf);
        List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
        JavaRDD<Integer> distData = sc.parallelize(data);
        sc.close();
    }

I get in the log again the Applicationmaster killed Error. Whatever is wrong here is not memory related, but I am having really difficulties to track this problem.

I have a mini-distributed setup with 4 machines for data/processing and 1 for the namenode.

Any help highly appreciated!

Upvotes: 3

Views: 1596

Answers (1)

Arnav
Arnav

Reputation: 153

This problem can occur when master and slaves are not started properly. start master and slaves using ./sbin/start-all.sh then submit your application.

Upvotes: 0

Related Questions