Causality
Causality

Reputation: 1123

Creating index in hive 0.9

I am trying to create index on tables in Hive 0.9. One table has 1 billion rows, another has 30 Million rows. The command I used is (other than creating the table and so on)

  CREATE INDEX DEAL_IDX_1 ON TABLE DEAL (ID) AS 
  'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
  WITH DEFERRED REBUILD;

  alter index DEAL_IDX_1 ON DEAL rebuild;

  set hive.optimize.autoindex=true;
  set hive.optimize.index.filter=true;

For the 30 Mill. row table, the rebuilding process looks alright (mapper and reducer both finished) until in the end it prints

  Invalid alter operation: Unable to alter index.
  FAILED: Execution Error, return code 1 
  from org.apache.hadoop.hive.ql.exec.DDLTask

Checking the log, and it had the error

java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver"

Not sure why this error was encountered, but anyway, I added the derby-version.jar:

add jar /path/derby-version.jar

The reported error was resolved, but still got another error:

org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
StatsPublishing error: cannot connect to database

Not sure how to solve the problem. I do see the created index table under hive/warehouse though.

For the 1 Billion row table, it is another story. The mapper just got stuck at 2% or so. And error showed

FATAL org.apache.hadoop.mapred.Child: Error running child : 
java.lang.OutOfMemoryError: Java heap space 

I attempted to enforce max heap size, as well as max mapr memory (see the settings mentioned somewhere but not in hive's configuration settings):

set mapred.child.java.opts =  -Xmx6024m
set mapred.job.map.memory.mb=6000;
set mapred.job.reduce.memory.mb=4000;

However, this is not help. The mapper would still got stuck at 2% with the same error.

Upvotes: 0

Views: 7435

Answers (1)

Alex Vertlieb
Alex Vertlieb

Reputation: 83

I had a similar problem of the index creating and in the hive/warehouse, but the process as a whole failing. My index_name was TypeTarget (yours is DEAL_IDX_1) and after many days of trying different approaches, making the index_name all lowercase (typetarget) fixed the issue. My problem was in Hive 0.10.0.

Also, the class not found and StatsPublishing issue is because by default, hive.stats.autogather is turned on. Turning that off (false) in hive-site.xml should get rid of those issues.

Hopefully this helps anyone looking for a quick fix.

Upvotes: 2

Related Questions