Reputation: 2148
I am new to hive and hadoop and just created a table (orc fileformat) on Hive. I am now trying to create indexes on my hive table (bitmap index). Every time I run the index build query, hive starts a map reduce job to index. At some point my map reduce job just hangs and one of my nodes (randomly different across multiple retries so its probably not the node) fails. I tried increasing my mapreduce.child.java.opts
to 2048mb but that was giving me errors with using up more memory than available so I increased, mapreduce.map.memory.mb
and mapreduce.reduce.memory.mb
to 8GB. All other configurations are left to the defaults.
Any help with what configurations I am missing out would be really appreciated.
Just for context, I am trying to index a table with 2.4 Billion rows, which is 450GB in size and has 3 partitions.
Upvotes: 1
Views: 281
Reputation: 7138
First, please confirm, if the indexing worked for data at small scale. Assuming it is done, the way the map reduce jobs are run by Hive, depends on many issues. 1. Type of queries(using count(*) or just Select *). 2. Also, the amount of memory a reducer is allocated during the execution phase.(This is controlled by hive.exec.reducers.bytes.per.reducer property).
In your care it can be second point. Give the scale at which you are running your program, please calculated the memory requirements accordingly. This post has more information. Happy learning and coding
Upvotes: 2