Reputation: 1001
I am running my hadoop jobs on a cluster consisting of multiple machines whose sizes are not known (main memory, number of cores, size etc.. per machine). Without using any OS specific library (*.so files I mean), is there any class or tools for hadoop in itself or some additional libraries where I could collect information like while the Hadoop MR jobs are being executed:
I don't have the hardware information or the specs of the cluster which is why I want to collect this kind of information programmatically in my hadoop code.
How can I achieve this? I want to know this kind of information because of different reasons. One reason is given by the following error: I want to know which machine ran out of space.
12/07/17 14:28:25 INFO mapred.JobClient: Task Id : attempt_201205221754_0208_m_001087_0, Status : FAILED
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill2.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:376)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1247)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1155)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.
Upvotes: 1
Views: 6297
Reputation: 822
The ohai library (part of Opscode Chef) is superb; it will output a JSON dump of all sorts of stats from the machine.
There used to be a flag -- mapred.max.maps.per.node
-- to limit the number of tasks any one job could run concurrently on one node, but it was removed. Boo. You would have to run a modified Scheduler to provide that functionality.
Upvotes: 0
Reputation: 33495
The master node would have ssh access to all the slaves and the list of all the nodes should be there in the slaves
files. So, write a script which iterates through the list of nodes in the slaves
file and copies the file to the master using scp
.
Something like this script should work
for i in `cat /home/praveensripati/Installations/hadoop-0.21.0/conf/slaves`;
do
scp praveensripati@$i:/proc/cpuinfo cpuinfo_$i
scp praveensripati@$i:/proc/meminfo meminfo_$i
done
The hos name/ip ($i) would be appended to the cpuinfo and the meminfo files. MR job would be an overkill for this task.
Upvotes: 1
Reputation: 30089
Assuming you are on a cluster that is deployed on Linux nodes, you can extract the CPU and memory information from the /proc/cpuinfo
and /proc/meminfo
files. You'll need to write a custom input format that ensures you touch each node in the cluster (or just process a Text file with a split size that ensures enough map tasks are generated to ensure that each task tracker node gets at least one task to execute.
You can output the information as pairs from the mapper (hostname, info), and dedup in the reducer
Note that cpuinfo will report the number of hyperthreaded cores (if you have a compatible CPU) rather than the number of cores, so a 4 core hyperthreaded CPU will probably show 8 'processors' in /proc/cpuinfo
Upvotes: 0