How to increase Mappers and Reducer in Apache TEZ

Question

I know this simple question, I need some help on this query from this community, When I create PartitionTable with ORC format, When I try to dump data from non partition table which is pointing to 2 GB File with 210 columns, I see Number of Mapper are 2 and reducer are 2 . is there a way to increase Mapper and reducer. My assumption is we cant set number of Mapper and reducer like MR 1.0, It is based on Settings like Yarn container size, Mapper minimum memory and maximum memory . can any one suggest me TEz Calculates mappers and reducers. What is best value to keep memory size setting, so that i dont come across : Java heap space, Java Out of Memory problem. My file size may grow upto 100GB. Please help me on this.

WestCoastProjects · Accepted Answer

You can still set the number of mappers and reducers in Yarn. Have you tried that? If so, please get back here.

Yarn changes the underlying execution mechanism, but #mappers and #reducers is describing the Job requirements - not the way the job resources are allocated (which is how yarn and mrv1 differ).

Traditional Map/Reduce has a hard coded number of map and reduce "slot". As you say - Yarn uses containers - which are per-application. Yarn is thus more flexible. But the #mappers and #reducers are inputs of the job in both cases. And also in both cases the actual number of mappers and reducers may differ from the requested number. Typically the #reducers would either be

(a) precisely the number that was requested
(b) exactly ONE reducer - that is if the job required it such as in total ordering

How to increase Mappers and Reducer in Apache TEZ

Answers (2)

Related Questions