Reputation: 436
I've configured our hadoop cluster with mapred_map_tasks_max to 6 and as expected, I see 6 mapred processes running when kicking of PIG jobs.
I am however a bit surprised to see the CPU usage on some of these individual processes to exceed 100% sometimes reaching 1000%+. Does mapreduce default to multiple threads? Could this be something with Pig itself?
All I could find online was some information about a setting (mapred.map.runner.class), but this doesn't appear to be set to MultiThreaded in anyway.
Thanks.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2630 mapred 20 0 53.4g 2.8g 12m S 218.1 4.5 1:17.32 java
2553 mapred 20 0 53.4g 2.8g 12m S 110.7 4.5 1:25.07 java
2636 mapred 20 0 53.4g 2.8g 12m S 110.4 4.5 1:11.58 java
2437 mapred 20 0 53.5g 5.6g 12m S 108.1 8.8 3:46.52 java
2353 mapred 20 0 53.5g 5.2g 12m S 101.1 8.3 3:35.27 java
2239 mapred 20 0 53.5g 5.8g 12m S 82.6 9.3 3:54.47 java
Upvotes: 1
Views: 1875
Reputation: 9073
It is possible with Hadoop to use a multi threaded mapper (see http://kickstarthadoop.blogspot.com/2012/02/enable-multiple-threads-in-mapper-aka.html). As far as I know, pig doesn't support multi threading jobs (although you can multi thread calling Pig Servers... https://issues.apache.org/jira/browse/PIG-240).
That said, Pig will by default run multiple mappers/reducers on the same host, one mapper/reducer per available core.
Upvotes: 2