Reputation: 431
I'm doing an importation of a sql database into an hive database on a hive client node (using the Hortonworks data platform) with the bash command :
$ hive -f tables.sql
I get the error :
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:429)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:718)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
I tried to increase the HADOOP_HEAPSIZE from 1GB to 4 GB but I still get the error. Any ideas ?
Upvotes: 1
Views: 340
Reputation: 9844
The OutOfMemoryError
came from the Hive codebase in CliDriver#processReader(BufferedReader)
.
public int processReader(BufferedReader r) throws IOException {
String line;
StringBuilder qsb = new StringBuilder();
while ((line = r.readLine()) != null) {
// Skipping through comments
if (! line.startsWith("--")) {
qsb.append(line + "\n");
}
}
return (processLine(qsb.toString()));
}
It is adding all of the lines read from the file to a StringBuilder
and then executing it. This must mean that the input file you specified is very large. Is it possible to split it into multiple smaller files and execute each separately, so that memory footprint is reduced?
You mentioned this is an import of a SQL database. Apache Sqoop might be a better fit for that use case.
Upvotes: 2