Bob
Bob

Reputation: 1001

How to collect Hadoop Cluster Size/Number of Cores Information

I am running my hadoop jobs on a cluster consisting of multiple machines whose sizes are not known (main memory, number of cores, size etc.. per machine). Without using any OS specific library (*.so files I mean), is there any class or tools for hadoop in itself or some additional libraries where I could collect information like while the Hadoop MR jobs are being executed:

  1. Total Number of cores / number of cores employed by the job
  2. Total available main memory / allocated available main memory
  3. Total Storage space on each machine/allocated storage space
  4. 4.

I don't have the hardware information or the specs of the cluster which is why I want to collect this kind of information programmatically in my hadoop code.

How can I achieve this? I want to know this kind of information because of different reasons. One reason is given by the following error: I want to know which machine ran out of space.

12/07/17 14:28:25 INFO mapred.JobClient: Task Id : attempt_201205221754_0208_m_001087_0, Status : FAILED

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill2.out

        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:376)

        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)

        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)

        at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)

        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1247)

        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1155)

        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)

        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)

        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)

        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.

Upvotes: 1

Views: 6297

Answers (3)

mrflip
mrflip

Reputation: 822

The ohai library (part of Opscode Chef) is superb; it will output a JSON dump of all sorts of stats from the machine.

There used to be a flag -- mapred.max.maps.per.node -- to limit the number of tasks any one job could run concurrently on one node, but it was removed. Boo. You would have to run a modified Scheduler to provide that functionality.

Upvotes: 0

Praveen Sripati
Praveen Sripati

Reputation: 33495

The master node would have ssh access to all the slaves and the list of all the nodes should be there in the slaves files. So, write a script which iterates through the list of nodes in the slaves file and copies the file to the master using scp.

Something like this script should work

for i in `cat /home/praveensripati/Installations/hadoop-0.21.0/conf/slaves`;
do
scp praveensripati@$i:/proc/cpuinfo cpuinfo_$i
scp praveensripati@$i:/proc/meminfo meminfo_$i
done

The hos name/ip ($i) would be appended to the cpuinfo and the meminfo files. MR job would be an overkill for this task.

Upvotes: 1

Chris White
Chris White

Reputation: 30089

Assuming you are on a cluster that is deployed on Linux nodes, you can extract the CPU and memory information from the /proc/cpuinfo and /proc/meminfo files. You'll need to write a custom input format that ensures you touch each node in the cluster (or just process a Text file with a split size that ensures enough map tasks are generated to ensure that each task tracker node gets at least one task to execute.

You can output the information as pairs from the mapper (hostname, info), and dedup in the reducer

Note that cpuinfo will report the number of hyperthreaded cores (if you have a compatible CPU) rather than the number of cores, so a 4 core hyperthreaded CPU will probably show 8 'processors' in /proc/cpuinfo

Upvotes: 0

Related Questions