Hadoop 2.5.2 on Mesos 0.21.0 - Failed to fetch URIs for container

Question

I'm trying to run a simple WordCount example on Mesos with Hadoop 2.5.2. I've successfully set up HDFS (actually got a YARN set up behind this and it is working fine). Mesos master is running and got 4 slaves connected to it. The Hadoop library for Mesos is 0.0.8.

The configuration for Hadoop 2.5.2 is (mapred-site.xml):


        
                mapred.job.tracker
                *.*.*.*:9001
        
        
                mapred.job.tracker.http.address
                *.*.*.*:50030
        
        
                mapred.jobtracker.taskScheduler
                org.apache.hadoop.mapred.MesosScheduler
        
        
                mapred.mesos.taskScheduler
                org.apache.hadoop.mapred.JobQueueTaskScheduler
        
        
                mapred.mesos.master
                *.*.*.*:5050
        
        
                mapred.mesos.executor.uri
                hdfs://*.*.*.*:9000/hadoop-2.5.0-cdh5.2.0.tgz

I've got the following logs from all my slaves (example):

dbpc42: I1202 00:03:12.066195 11232 launcher.cpp:137] Forked child with pid '18714' for container 'c10c2d2b-bf4b-469b-97a2-60c9720773b4'

dbpc42: I1202 00:03:12.068272 11232 containerizer.cpp:571] Fetching URIs for container 'c10c2d2b-bf4b-469b-97a2-60c9720773b4' using command '/opt/mesos-0.21.0/build/src/mesos-fetcher'

dbpc42: I1202 00:03:12.140894 11226 containerizer.cpp:946] Destroying container 'c10c2d2b-bf4b-469b-97a2-60c9720773b4'

dbpc42: E1202 00:03:12.141315 11229 slave.cpp:2787] Container 'c10c2d2b-bf4b-469b-97a2-60c9720773b4' for executor 'executor_Task_Tracker_93' of framework '20141201-225046-698725789-5050-19765-0003' failed to start: Failed to fetch URIs for container 'c10c2d2b-bf4b-469b-97a2-60c9720773b4': exit status 256

dbpc42: I1202 00:03:12.242033 11231 containerizer.cpp:1117] Executor for container 'c10c2d2b-bf4b-469b-97a2-60c9720773b4' has exited dbpc42: I1202 00:03:12.243896 11225 slave.cpp:2898] Executor 'executor_Task_Tracker_93' of framework 20141201-225046-698725789-5050-19765-0003 exited with status 1

Job tracker running fine, with the hadoop jar command the job stucks at map 0% reduce 0%. In mesos cluster information the TASKS_LOST counter goes all the way up until I kill the job. Mesos and the JobTracker runs as root, the job runs as user hdfs.

What is this URI problem all about?

Thank you for your kind help or hint!

(I'll provide more information if needed.)

UPDATE

Starting a slave on the same PC where the master runs will get tasks to the staging status. 5, each time.

The mapred-mesos.executor.uri has been changed from the IP to dbpc41 (master PC).


     mapred.mesos.executor.uri
     hdfs://dbpc41:9000/hadoop-2.5.0-cdh5.2.0.tgz



The other 4 slaves are still losing tasks due to (probably) unable to fetch the executor URI.

These are the logs from the 5th slave running on the same PC where the master do:


  I1202 16:17:57.434345  1405 containerizer.cpp:571] Fetching URIs for
  container '5f33123b-00eb-4e05-9dcc-30f16f5eee44' using command
  '/opt/mesos-0.21.0/build/src/mesos-fetcher' I1202 16:18:08.620708 
  1412 slave.cpp:2840] Monitoring executor 'executor_Task_Tracker_445'
  of framework '20141201-225046-698725789-5050-19765-0012' in container
  '5f33123b-00eb-4e05-9dcc-30f16f5eee44' I1202 16:18:09.022902  1407
  containerizer.cpp:1117] Executor for container
  '5f33123b-00eb-4e05-9dcc-30f16f5eee44' has exited I1202
  16:18:09.022964  1407 containerizer.cpp:946] Destroying container
  '5f33123b-00eb-4e05-9dcc-30f16f5eee44' W1202 16:18:11.369912  1407
  containerizer.cpp:888] Skipping resource statistic for container
  5f33123b-00eb-4e05-9dcc-30f16f5eee44 because: Failed to get usage: No
  process found at 11093 W1202 16:18:11.369971  1407
  containerizer.cpp:888] Skipping resource statistic for container
  5f33123b-00eb-4e05-9dcc-30f16f5eee44 because: Failed to get usage: No
  process found at 11093 I1202 16:18:11.399648  1412 slave.cpp:2898]
  Executor 'executor_Task_Tracker_445' of framework
  20141201-225046-698725789-5050-19765-0012 exited with status 1 I1202
  16:18:11.401949  1412 slave.cpp:2215] Handling status update TASK_LOST
  (UUID: 959709c2-5546-41fd-9af3-09f024bb6354) for task Task_Tracker_445
  of framework 20141201-225046-698725789-5050-19765-0012 from @0.0.0.0:0
  W1202 16:18:11.402245  1409 containerizer.cpp:852] Ignoring update for
  unknown container: 5f33123b-00eb-4e05-9dcc-30f16f5eee44 I1202
  16:18:11.403017  1410 status_update_manager.cpp:317] Received status
  update TASK_LOST (UUID: 959709c2-5546-41fd-9af3-09f024bb6354) for task
  Task_Tracker_445 of framework
  20141201-225046-698725789-5050-19765-0012 I1202 16:18:11.403437  1406
  slave.cpp:2458] Forwarding the update TASK_LOST (UUID:
  959709c2-5546-41fd-9af3-09f024bb6354) for task Task_Tracker_445 of
  framework 20141201-225046-698725789-5050-19765-0012 to
  master@157.181.165.41:5050 I1202 16:18:11.448752  1409
  status_update_manager.cpp:389] Received status update acknowledgement
  (UUID: 959709c2-5546-41fd-9af3-09f024bb6354) for task Task_Tracker_445
  of framework 20141201-225046-698725789-5050-19765-0012 I1202
  16:18:11.449354  1408 slave.cpp:3007] Cleaning up executor
  'executor_Task_Tracker_445' of framework
  20141201-225046-698725789-5050-19765-0012 I1202 16:18:11.449707  1405
  gc.cpp:56] Scheduling
  '/tmp/mesos/slaves/20141201-225046-698725789-5050-19765-S4/frameworks/20141201-225046-698725789-5050-19765-0012/executors/executor_Task_Tracker_445/runs/5f33123b-00eb-4e05-9dcc-30f16f5eee44'
  for gc 6.99999479755852days in the future I1202 16:18:11.450034  1409
  gc.cpp:56] Scheduling
  '/tmp/mesos/slaves/20141201-225046-698725789-5050-19765-S4/frameworks/20141201-225046-698725789-5050-19765-0012/executors/executor_Task_Tracker_445' for gc 6.9999947929037days in the future I1202 16:18:11.450147  1408
  slave.cpp:3084] Cleaning up framework
  20141201-225046-698725789-5050-19765-0012 I1202 16:18:11.450213  1406
  status_update_manager.cpp:279] Closing status update streams for
  framework 20141201-225046-698725789-5050-19765-0012 I1202
  16:18:11.450381  1412 gc.cpp:56] Scheduling
  '/tmp/mesos/slaves/20141201-225046-698725789-5050-19765-S4/frameworks/20141201-225046-698725789-5050-19765-0012'
  for gc 6.99999478812444days in the future I1202 16:18:12.441505  1405
  slave.cpp:1083] Got assigned task Task_Tracker_472 for framework
  20141201-225046-698725789-5050-19765-0012 I1202 16:18:12.442337  1405
  gc.cpp:84] Unscheduling
  '/tmp/mesos/slaves/20141201-225046-698725789-5050-19765-S4/frameworks/20141201-225046-698725789-5050-19765-0012'
  from gc I1202 16:18:12.442617  1405 slave.cpp:1193] Launching task
  Task_Tracker_472 for framework
  20141201-225046-698725789-5050-19765-0012 I1202 16:18:12.444263  1405
  slave.cpp:3997] Launching executor executor_Task_Tracker_472 of
  framework 20141201-225046-698725789-5050-19765-0012 in work directory
  '/tmp/mesos/slaves/20141201-225046-698725789-5050-19765-S4/frameworks/20141201-225046-698725789-5050-19765-0012/executors/executor_Task_Tracker_472/runs/2310c642-02bf-401b-954c-876c88675c31'
  I1202 16:18:12.444756  1405 slave.cpp:1316] Queuing task
  'Task_Tracker_472' for executor executor_Task_Tracker_472 of framework
  '20141201-225046-698725789-5050-19765-0012 I1202 16:18:12.444793  1406
  containerizer.cpp:424] Starting container
  '2310c642-02bf-401b-954c-876c88675c31' for executor
  'executor_Task_Tracker_472' of framework
  '20141201-225046-698725789-5050-19765-0012' I1202 16:18:12.447434 
  1406 launcher.cpp:137] Forked child with pid '11549' for container
  '2310c642-02bf-401b-954c-876c88675c31' I1202 16:18:12.448652  1406
  containerizer.cpp:571] Fetching URIs for container
  '2310c642-02bf-401b-954c-876c88675c31' using command
  '/opt/mesos-0.21.0/build/src/mesos-fetcher'

Dyin · Accepted Answer

Checked executor logs (stderr in /tmp/mesos/slaves/...) and found out that JAVA_HOME was not set, so the hadoop dfs command was not able to run to fetch the executor. The URI was perfect, the JAVA_HOME was not set. Additionally I had to set HADOOP_HOME when starting the slaves.

Hadoop 2.5.2 on Mesos 0.21.0 - Failed to fetch URIs for container

Answers (2)

Related Questions