Spark: Association with remote system lost akka.tcp (disassociated)

Question

I am using Spark 1.3.0 with Hadoop/Yarn and I am having an error message which says

WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@virtm2:51482] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].

I read about it and found that setting the akka heartbeat interval to 100 would solve this problem:

SparkConf conf = new SparkConf().setAppName("Name");
conf.set("spark.akka.heartbeat.interval", "100");

Unfortunately, it does not in my case. The jobs fails with this error as cause after a few seconds I hit enter.

I submit the job with this command:

/usr/local/spark130/bin/spark-submit 
--class de.unidue.langTecspark.TweetTag 
--master yarn-client 
--executor-memory 2g  
--driver-memory 4g 
/home/huser/sparkIt-1.0-standalone.jar

The logs of the executing container on the nodes say the Application master got killed

5 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM

I attempted to let a minimal example run, this one (it essential does nothing..just to see if it has the same problem.):

public static void main(String [] args){
        SparkConf conf = new SparkConf().setAppName("Minimal");
        JavaSparkContext sc = new JavaSparkContext(conf);
        List data = Arrays.asList(1, 2, 3, 4, 5);
        JavaRDD distData = sc.parallelize(data);
        sc.close();
    }

I get in the log again the Applicationmaster killed Error. Whatever is wrong here is not memory related, but I am having really difficulties to track this problem.

I have a mini-distributed setup with 4 machines for data/processing and 1 for the namenode.

Any help highly appreciated!

Spark: Association with remote system lost akka.tcp (disassociated)

Answers (1)

Related Questions