Hưởng Lê Văn
Hưởng Lê Văn

Reputation: 11

Spark job finished, wrote output file but executor state is KILLED?

I'm trying to run an example from a book(Learning Spark by O'Reilly), it's just a simple a word count program. I submitted it, the output was written and it was correct, but in Spark master page

the executor state was KILLED

there is no error on stderr. although the output was correct, my trainer said I need to fix this problem so I've spent a few days but can't find any fix.

Here is the log

2018-07-31 17:52:49 INFO  CoarseGrainedExecutorBackend:2611 - Started daemon with process name: 3783@levanhuong
2018-07-31 17:52:49 INFO  SignalUtils:54 - Registered signal handler for TERM
2018-07-31 17:52:49 INFO  SignalUtils:54 - Registered signal handler for HUP
2018-07-31 17:52:49 INFO  SignalUtils:54 - Registered signal handler for INT
2018-07-31 17:52:49 WARN  Utils:66 - Your hostname, levanhuong resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
2018-07-31 17:52:49 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-07-31 17:52:50 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-07-31 17:52:50 INFO  SecurityManager:54 - Changing view acls to: levanhuong221
2018-07-31 17:52:50 INFO  SecurityManager:54 - Changing modify acls to: levanhuong221
2018-07-31 17:52:50 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-07-31 17:52:50 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-07-31 17:52:50 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(levanhuong221); groups with view permissions: Set(); users  with modify permissions: Set(levanhuong221); groups with modify permissions: Set()
2018-07-31 17:52:51 INFO  TransportClientFactory:267 - Successfully created connection to /10.0.2.15:42475 after 186 ms (0 ms spent in bootstraps)
2018-07-31 17:52:51 INFO  SecurityManager:54 - Changing view acls to: levanhuong221
2018-07-31 17:52:51 INFO  SecurityManager:54 - Changing modify acls to: levanhuong221
2018-07-31 17:52:51 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-07-31 17:52:51 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-07-31 17:52:51 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(levanhuong221); groups with view permissions: Set(); users  with modify permissions: Set(levanhuong221); groups with modify permissions: Set()
2018-07-31 17:52:51 INFO  TransportClientFactory:267 - Successfully created connection to /10.0.2.15:42475 after 3 ms (0 ms spent in bootstraps)
2018-07-31 17:52:51 INFO  DiskBlockManager:54 - Created local directory at /tmp/spark-2f1226a0-cace-4e98-b86e-c8a1106f9751/executor-4028ee6e-8aed-48fc-bd2f-5ed99efde94b/blockmgr-c7020392-e1bf-4551-b039-5b3eab7e2e08
2018-07-31 17:52:51 INFO  MemoryStore:54 - MemoryStore started with capacity 93.3 MB
2018-07-31 17:52:52 INFO  CoarseGrainedExecutorBackend:54 - Connecting to driver: spark://[email protected]:42475
2018-07-31 17:52:52 INFO  WorkerWatcher:54 - Connecting to worker spark://[email protected]:35241
2018-07-31 17:52:52 INFO  TransportClientFactory:267 - Successfully created connection to /10.0.2.15:35241 after 20 ms (0 ms spent in bootstraps)
2018-07-31 17:52:52 INFO  WorkerWatcher:54 - Successfully connected to spark://[email protected]:35241
2018-07-31 17:52:52 INFO  CoarseGrainedExecutorBackend:54 - Successfully registered with driver
2018-07-31 17:52:52 INFO  Executor:54 - Starting executor ID 0 on host 10.0.2.15
2018-07-31 17:52:52 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42985.
2018-07-31 17:52:52 INFO  NettyBlockTransferService:54 - Server created on 10.0.2.15:42985
2018-07-31 17:52:52 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-07-31 17:52:52 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(0, 10.0.2.15, 42985, None)
2018-07-31 17:52:52 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(0, 10.0.2.15, 42985, None)
2018-07-31 17:52:52 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(0, 10.0.2.15, 42985, None)
2018-07-31 17:52:52 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 0
2018-07-31 17:52:52 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 1
2018-07-31 17:52:52 INFO  Executor:54 - Running task 0.0 in stage 0.0 (TID 0)
2018-07-31 17:52:52 INFO  Executor:54 - Running task 1.0 in stage 0.0 (TID 1)
2018-07-31 17:52:52 INFO  Executor:54 - Fetching spark://10.0.2.15:42475/jars/Spark-1.0.jar with timestamp 1533034366863
2018-07-31 17:52:52 INFO  TransportClientFactory:267 - Successfully created connection to /10.0.2.15:42475 after 4 ms (0 ms spent in bootstraps)
2018-07-31 17:52:52 INFO  Utils:54 - Fetching spark://10.0.2.15:42475/jars/Spark-1.0.jar to /tmp/spark-2f1226a0-cace-4e98-b86e-c8a1106f9751/executor-4028ee6e-8aed-48fc-bd2f-5ed99efde94b/spark-c6b95c11-80a9-495c-9503-29f4c1225caa/fetchFileTemp3888279004875803307.tmp
2018-07-31 17:52:52 INFO  Utils:54 - Copying /tmp/spark-2f1226a0-cace-4e98-b86e-c8a1106f9751/executor-4028ee6e-8aed-48fc-bd2f-5ed99efde94b/spark-c6b95c11-80a9-495c-9503-29f4c1225caa/17384549101533034366863_cache to /usr/local/spark-2.3.1-bin-hadoop2.7/work/app-20180731175247-0012/0/./Spark-1.0.jar
2018-07-31 17:52:52 INFO  Executor:54 - Adding file:/usr/local/spark-2.3.1-bin-hadoop2.7/work/app-20180731175247-0012/0/./Spark-1.0.jar to class loader
2018-07-31 17:52:52 INFO  TorrentBroadcast:54 - Started reading broadcast variable 1
2018-07-31 17:52:52 INFO  TransportClientFactory:267 - Successfully created connection to /10.0.2.15:40191 after 5 ms (0 ms spent in bootstraps)
2018-07-31 17:52:52 INFO  MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 93.3 MB)
2018-07-31 17:52:53 INFO  TorrentBroadcast:54 - Reading broadcast variable 1 took 209 ms
2018-07-31 17:52:53 INFO  MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 5.0 KB, free 93.3 MB)
2018-07-31 17:52:53 INFO  HadoopRDD:54 - Input split: file:/home/levanhuong221/01.txt:0+1650317
2018-07-31 17:52:53 INFO  HadoopRDD:54 - Input split: file:/home/levanhuong221/01.txt:1650317+1650318
2018-07-31 17:52:53 INFO  TorrentBroadcast:54 - Started reading broadcast variable 0
2018-07-31 17:52:53 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 93.3 MB)
2018-07-31 17:52:53 INFO  TorrentBroadcast:54 - Reading broadcast variable 0 took 22 ms
2018-07-31 17:52:53 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 320.9 KB, free 93.0 MB)
2018-07-31 17:52:54 INFO  LineRecordReader:221 - Found UTF-8 BOM and skipped it
2018-07-31 17:52:55 INFO  Executor:54 - Finished task 1.0 in stage 0.0 (TID 1). 1189 bytes result sent to driver
2018-07-31 17:52:55 INFO  Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 1146 bytes result sent to driver
2018-07-31 17:52:55 INFO  CoarseGrainedExecutorBackend:54 - Got assigned task 2
2018-07-31 17:52:55 INFO  Executor:54 - Running task 0.0 in stage 1.0 (TID 2)
2018-07-31 17:52:55 INFO  MapOutputTrackerWorker:54 - Updating epoch to 1 and clearing cache
2018-07-31 17:52:55 INFO  TorrentBroadcast:54 - Started reading broadcast variable 2
2018-07-31 17:52:55 INFO  MemoryStore:54 - Block broadcast_2_piece0 stored as bytes in memory (estimated size 26.2 KB, free 92.9 MB)
2018-07-31 17:52:55 INFO  TorrentBroadcast:54 - Reading broadcast variable 2 took 32 ms
2018-07-31 17:52:55 INFO  MemoryStore:54 - Block broadcast_2 stored as values in memory (estimated size 72.5 KB, free 92.9 MB)
2018-07-31 17:52:55 INFO  FileOutputCommitter:108 - File Output Committer Algorithm version is 1
2018-07-31 17:52:55 INFO  MapOutputTrackerWorker:54 - Don't have map outputs for shuffle 0, fetching them
2018-07-31 17:52:55 INFO  MapOutputTrackerWorker:54 - Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://[email protected]:42475)
2018-07-31 17:52:55 INFO  MapOutputTrackerWorker:54 - Got the output locations
2018-07-31 17:52:56 INFO  ShuffleBlockFetcherIterator:54 - Getting 2 non-empty blocks out of 2 blocks
2018-07-31 17:52:56 INFO  ShuffleBlockFetcherIterator:54 - Started 0 remote fetches in 125 ms
2018-07-31 17:52:56 INFO  ShuffleBlockFetcherIterator:54 - Getting 2 non-empty blocks out of 2 blocks
2018-07-31 17:52:56 INFO  ShuffleBlockFetcherIterator:54 - Started 0 remote fetches in 1 ms
2018-07-31 17:52:56 INFO  FileOutputCommitter:535 - Saved output of task 'attempt_20180731175250_0006_m_000000_0' to file:/home/levanhuong221/output/master_url/_temporary/0/task_20180731175250_0006_m_000000
2018-07-31 17:52:56 INFO  SparkHadoopMapRedUtil:54 - attempt_20180731175250_0006_m_000000_0: Committed
2018-07-31 17:52:56 INFO  Executor:54 - Finished task 0.0 in stage 1.0 (TID 2). 1545 bytes result sent to driver
2018-07-31 17:52:56 INFO  CoarseGrainedExecutorBackend:54 - Driver commanded a shutdown
2018-07-31 17:52:56 ERROR CoarseGrainedExecutorBackend:43 - RECEIVED SIGNAL TERM
utdown

edit: added spark job code

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;
import java.io.Serializable;
import java.util.Arrays;
import java.util.Iterator;
public class WordCount implements Serializable{
    public static void main(String[] args) {

        // Op.Create a Java Spark Context
        SparkConf conf = new SparkConf().setAppName("wordCount");
        JavaSparkContext sc = new JavaSparkContext(conf);
        // Load our input data.
        JavaRDD<String> input = sc.textFile(args[0]);
        // Split up into words.
        JavaRDD<String> words = input.flatMap(
                new FlatMapFunction<String, String>() {
                    public Iterator<String> call(String x) {
                        return Arrays.asList(x.split("\\W")).iterator();
                    }});
        // Transform into pairs and count.
        JavaPairRDD<String, Integer> counts = words.mapToPair(
                new PairFunction<String, String, Integer>(){
                    public Tuple2<String, Integer> call(String x){
                        return new Tuple2(x.toLowerCase(), 1);
                    }}).reduceByKey(new Function2<Integer, Integer, Integer>(){
            public Integer call(Integer x, Integer y){ return x + y;}});
        // Save the word count back out to a text file, causing evaluation.
        counts.coalesce(1).saveAsTextFile(args[1]);
        sc.close();
    }
}

Upvotes: 1

Views: 1237

Answers (1)

DNA
DNA

Reputation: 42586

That error message appears to be normal and expected behaviour (though misleading in that it appears at ERROR level) according to the discussion on the Spark Jira for issue SPARK-21733, which has been closed as "Not a problem".

One of the Spark developers states:

"This is because executor is killed by NM with SIGTERM, this is quite normal for Spark on YARN application, I don't think there's an issue beside this error log. Everything will be cleaned out with shutdown hook even SIGTERM received, so it should be fine like clean stop."

One of the other commenters suggests that you can avoid this error by setting

spark.network.timeout=420000

(though without explanation of why this helps - I have not tested it).

Upvotes: 2

Related Questions