My Yarn Map-Reduce Job is taking a lot of time

Question

Input File size : 75GB

Number of Mappers : 2273

Number of reducers : 1 (As shown in the web UI)

Number of splits : 2273

Number of Input files : 867

Cluster : Apache Hadoop 2.4.0

5 nodes cluster, 1TB each.

1 master and 4 Datanodes.

It's been 4 hrs. now and still only 12% of map is completed. Just wanted to know given my cluster configuration does this make sense or is there anything wrong with the configuration?

Yarn-site.xml

         
             yarn.nodemanager.aux-services
             mapreduce_shuffle
             
             
             yarn.nodemanager.aux- services.mapreduce.shuffle.class
             org.apache.hadoop.mapred.ShuffleHandler
             
             
             yarn.resourcemanager.resource- tracker.address
             master:8025
             
             
             yarn.resourcemanager.scheduler.address
             master:8030
             
             
              yarn.resourcemanager.scheduler.address
             master:8030
             
             
             yarn.resourcemanager.address
             master:8040
             
             
             yarn.resourcemanager.hostname
             master
             The hostname of the RM.
             
             
             yarn.scheduler.minimum-allocation-mb
             1024
             Minimum limit of memory to allocate to each container request at the Resource Manager.
             
             
             yarn.scheduler.maximum-allocation-mb
             8192
             Maximum limit of memory to allocate to each container request at the Resource Manager.
             
             
             yarn.scheduler.minimum-allocation-vcores
             1
             The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.
             
             
             yarn.scheduler.maximum-allocation-vcores
             32
             The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.
             
             
             yarn.nodemanager.resource.memory-mb
             8192
             Physical memory, in MB, to be made available to running containers
             
             
             yarn.nodemanager.resource.cpu-vcores
             4
             Number of CPU cores that can be allocated for containers.
             
             
             yarn.nodemanager.vmem-pmem-ratio
             4
              
             
   yarn.nodemanager.vmem-check-enabled
   false
   Whether virtual memory limits will be enforced for containers

Map-Reduce job where I am using multiple outputs. So reducer will emit multiple files. Each machine has 15GB Ram. Containers running are 8. Total memory available is 32GB in RM Web UI.

Any guidance is appreciated. Thanks in advance.

My Yarn Map-Reduce Job is taking a lot of time

Answers (1)

Related Questions