Suvo
Suvo

Reputation: 19

Reducer Selection in Hive

I have following record set to process like

 1000, 1001, 1002 to 1999,
 2000, 2001, 2002 to 2999,
 3000, 3001, 3002 to 3999

And I want to process the following record set using HIVE in such a way so that reducer-1 will process data 1000 to 1999 and reducer-2 will process data 2000 to 2999 and reducer-3 will process data 3000 to 3999.Please help me to solve above problem.

Upvotes: 1

Views: 68

Answers (1)

leftjoin
leftjoin

Reputation: 38290

Use DISTRIBUTE BY, mappers output is being grouped according to the distribute by clause to be transferred to reducers for processing:

select ...
  from ...
distribute by case when col between 1000 and 1999 then 1
                   when col between 2000 and 2999 then 2
                   when col between 3000 and 3999 then 3
               end

Or simply

distribute by floor(col/1000)

Upvotes: 2

Related Questions