user145610
user145610

Reputation: 3025

How to increase Mappers and Reducer in Apache TEZ

I know this simple question, I need some help on this query from this community, When I create PartitionTable with ORC format, When I try to dump data from non partition table which is pointing to 2 GB File with 210 columns, I see Number of Mapper are 2 and reducer are 2 . is there a way to increase Mapper and reducer. My assumption is we cant set number of Mapper and reducer like MR 1.0, It is based on Settings like Yarn container size, Mapper minimum memory and maximum memory . can any one suggest me TEz Calculates mappers and reducers. What is best value to keep memory size setting, so that i dont come across : Java heap space, Java Out of Memory problem. My file size may grow upto 100GB. Please help me on this.

Upvotes: 1

Views: 2740

Answers (2)

hadoop_user
hadoop_user

Reputation: 181

For the memory settings, if you are using hive with tez, the following 2 settings will be of use to you:

1) hive.tez.container.size - this is the size of the Yarn Container that will be used ( value in MB ).

2) hive.tez.java.opts - this is for the java opts that will be used for each task. If container size is set to 1024 MB, set java opts to say something like "-Xmx800m" and not "-Xmx1024m". YARN kills processes that use more memory than specified container size and given that a java process's memory footprint usually can exceed the specified Xmx value, setting Xmx to be the same value as the container size usually leads to problems.

Upvotes: 1

WestCoastProjects
WestCoastProjects

Reputation: 63062

You can still set the number of mappers and reducers in Yarn. Have you tried that? If so, please get back here.

Yarn changes the underlying execution mechanism, but #mappers and #reducers is describing the Job requirements - not the way the job resources are allocated (which is how yarn and mrv1 differ).

Traditional Map/Reduce has a hard coded number of map and reduce "slot". As you say - Yarn uses containers - which are per-application. Yarn is thus more flexible. But the #mappers and #reducers are inputs of the job in both cases. And also in both cases the actual number of mappers and reducers may differ from the requested number. Typically the #reducers would either be

  • (a) precisely the number that was requested
  • (b) exactly ONE reducer - that is if the job required it such as in total ordering

Upvotes: 1

Related Questions