Reputation: 1596
I am executing a query in HIVE shell as
SELECT tradeId, bookid, foid from trades where bookid='"ABCDEFG"'
The table "trades" has index on bookid. When the query runs, it shows the details of Mappers and Reducers as follows :-
Number of reduce tasks is set to 0 since there's no reduce operator
Hadoop job information for Stage-1: number of mappers: 48; number of reducers: 0
Time taken: **606.183 seconds**, Fetched: **18 row(s)**
If you see it took enormous amount of time to fetch just 18 rows. My question is what I am doing wrong here ? Should the recuder be non-zero ? Will it help if I set it using
set mapred.reduce.tasks = some_number
Shouldn't the indexes help retrieve the data faster ?
Upvotes: 0
Views: 550
Reputation: 164
When you are doing simple select, all the filtering thing and column selection are done by the mappers itself. There is no purpose for reducer task here, hence number of reducer is zero - which is fine. You probably have around 48*block size amount of data in your table so it spawned 48 mappers. How many map slot per DN do you have and how many of them were free when you fired your query? Chances are all 48 of them are not running in parallel. Though it returned only 18 rows, it read the full table. Is your table bucketed and clustered on the bookid column - in that case you may use TABLESAMPLE
clause to make it read only the buckets that contain your ABCDEFG value.
Upvotes: 1