Hive import fail [java.lang.OutOfMemoryError]

Question

I'm doing an importation of a sql database into an hive database on a hive client node (using the Hortonworks data platform) with the bash command :

$ hive -f tables.sql

I get the error :

log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.

Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
    at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:429)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:718)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

I tried to increase the HADOOP_HEAPSIZE from 1GB to 4 GB but I still get the error. Any ideas ?

Chris Nauroth · Accepted Answer

The OutOfMemoryError came from the Hive codebase in CliDriver#processReader(BufferedReader).

public int processReader(BufferedReader r) throws IOException {
  String line;
  StringBuilder qsb = new StringBuilder();

  while ((line = r.readLine()) != null) {
    // Skipping through comments
    if (! line.startsWith("--")) {
      qsb.append(line + "
");
    }
  }

  return (processLine(qsb.toString()));
}

It is adding all of the lines read from the file to a StringBuilder and then executing it. This must mean that the input file you specified is very large. Is it possible to split it into multiple smaller files and execute each separately, so that memory footprint is reduced?

You mentioned this is an import of a SQL database. Apache Sqoop might be a better fit for that use case.

Hive import fail [java.lang.OutOfMemoryError]

Answers (1)

Related Questions