charmquark
charmquark

Reputation: 436

Is the mapred process in Hadoop multi-threaded?

I've configured our hadoop cluster with mapred_map_tasks_max to 6 and as expected, I see 6 mapred processes running when kicking of PIG jobs.

I am however a bit surprised to see the CPU usage on some of these individual processes to exceed 100% sometimes reaching 1000%+. Does mapreduce default to multiple threads? Could this be something with Pig itself?

All I could find online was some information about a setting (mapred.map.runner.class), but this doesn't appear to be set to MultiThreaded in anyway.

Thanks.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2630 mapred 20 0 53.4g 2.8g 12m S 218.1 4.5 1:17.32 java
2553 mapred 20 0 53.4g 2.8g 12m S 110.7 4.5 1:25.07 java
2636 mapred 20 0 53.4g 2.8g 12m S 110.4 4.5 1:11.58 java
2437 mapred 20 0 53.5g 5.6g 12m S 108.1 8.8 3:46.52 java
2353 mapred 20 0 53.5g 5.2g 12m S 101.1 8.3 3:35.27 java
2239 mapred 20 0 53.5g 5.8g 12m S 82.6 9.3 3:54.47 java

Upvotes: 1

Views: 1875

Answers (1)

DMulligan
DMulligan

Reputation: 9073

It is possible with Hadoop to use a multi threaded mapper (see http://kickstarthadoop.blogspot.com/2012/02/enable-multiple-threads-in-mapper-aka.html). As far as I know, pig doesn't support multi threading jobs (although you can multi thread calling Pig Servers... https://issues.apache.org/jira/browse/PIG-240).

That said, Pig will by default run multiple mappers/reducers on the same host, one mapper/reducer per available core.

Upvotes: 2

Related Questions