rajibdotnet
rajibdotnet

Reputation: 1596

Relationship between a HIVE query and the number of mapreducers provided by Hadoop?

I am executing a query in HIVE shell as

SELECT tradeId, bookid, foid from trades where bookid='"ABCDEFG"'

The table "trades" has index on bookid. When the query runs, it shows the details of Mappers and Reducers as follows :-

Number of reduce tasks is set to 0 since there's no reduce operator
Hadoop job information for Stage-1: number of mappers: 48; number of reducers: 0
Time taken: **606.183 seconds**, Fetched: **18 row(s)**

If you see it took enormous amount of time to fetch just 18 rows. My question is what I am doing wrong here ? Should the recuder be non-zero ? Will it help if I set it using

set mapred.reduce.tasks = some_number

Shouldn't the indexes help retrieve the data faster ?

Upvotes: 0

Views: 550

Answers (1)

Arijit Banerjee
Arijit Banerjee

Reputation: 164

When you are doing simple select, all the filtering thing and column selection are done by the mappers itself. There is no purpose for reducer task here, hence number of reducer is zero - which is fine. You probably have around 48*block size amount of data in your table so it spawned 48 mappers. How many map slot per DN do you have and how many of them were free when you fired your query? Chances are all 48 of them are not running in parallel. Though it returned only 18 rows, it read the full table. Is your table bucketed and clustered on the bookid column - in that case you may use TABLESAMPLE clause to make it read only the buckets that contain your ABCDEFG value.

Upvotes: 1

Related Questions