Reputation: 322
Summary: Trying to insert a Spark DataFrame into a hive table causes loop of error and database corruption.
Details:
Loop of error:
df.show(5)
df.write.mode(SaveMode.Overwrite).saveAsTable("dbnamexxx.tablenamexxx")
Yields:
+---+---+------+---+-------+-------------------+---------------+-------------+--------+
| zz|zzz|zzzzzz| zz|zzzzzzz| zzzzz_zzzz|zzzzzzzzzz_zzzz|zz_zzzzzzzzzz|zz_zzzzz|
+---+---+------+---+-------+-------------------+---------------+-------------+--------+
|833| 13| 1| 19| 477|2017-11-00 00000000| null| 0| 29|
|833| 3| 1| 13| 280|2017-11-00 00000000| null| 0| 29|
|833| 9| 1| 13| 442|2017-11-00 00000000| null| 0| 29|
|833| 3| 1| 19| 173|2017-11-00 00000000| null| 0| 29|
|833| 14| 1| 17| 360|2017-11-00 00000000| null| 0| 29|
+---+---+------+---+-------+-------------------+---------------+-------------+--------+
(Included just to show that the table is ok)
Then the error (which repeats itself every ~2 seconds):
[Stage 5:===> (13 + 4) / 200]2018-03-25 01:12:53 WARN DFSClient:611 - Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
2018-03-25 01:13:04 WARN Persist:96 - Insert of object "org.apache.hadoop.hive.metastore.model.MTable@5e251945" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED' cannot accept a NULL value.
2018-03-25 01:13:04 ERROR RetryingHMSHandler:173 - Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDODataStoreException: Insert of object "org.apache.hadoop.hive.metastore.model.MTable@5e251945" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED' cannot accept a NULL value.
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:814)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy15.createTable(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1416)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1449)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy17.create_table_with_environment_context(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2050)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:97)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:669)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy18.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:468)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:466)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:466)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255)
at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:466)
at org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:479)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$createDataSourceTable(HiveExternalCatalog.scala:367)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:243)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:119)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:304)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:184)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:458)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:437)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:35)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:37)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:39)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:41)
at $line29.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:43)
at $line29.$read$$iw$$iw$$iw$$iw.<init>(<console>:45)
at $line29.$read$$iw$$iw$$iw.<init>(<console>:47)
at $line29.$read$$iw$$iw.<init>(<console>:49)
at $line29.$read$$iw.<init>(<console>:51)
at $line29.$read.<init>(<console>:53)
at $line29.$read$.<init>(<console>:57)
at $line29.$read$.<clinit>(<console>)
at $line29.$eval$.$print$lzycompute(<console>:7)
at $line29.$eval$.$print(<console>:6)
at $line29.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5$$anonfun$apply$6.apply(ILoop.scala:427)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5$$anonfun$apply$6.apply(ILoop.scala:423)
at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:111)
at scala.reflect.io.File.applyReader(File.scala:50)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5.apply(ILoop.scala:423)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5.apply(ILoop.scala:423)
at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:91)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:422)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:422)
at scala.tools.nsc.interpreter.ILoop.savingReader(ILoop.scala:96)
at scala.tools.nsc.interpreter.ILoop.interpretAllFrom(ILoop.scala:421)
at scala.tools.nsc.interpreter.ILoop$$anonfun$run$3$1.apply(ILoop.scala:577)
at scala.tools.nsc.interpreter.ILoop$$anonfun$run$3$1.apply(ILoop.scala:576)
at scala.tools.nsc.interpreter.ILoop.withFile(ILoop.scala:570)
at scala.tools.nsc.interpreter.ILoop.run$3(ILoop.scala:576)
at scala.tools.nsc.interpreter.ILoop.loadCommand(ILoop.scala:583)
at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$8.apply(ILoop.scala:207)
at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$8.apply(ILoop.scala:207)
at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:62)
at scala.tools.nsc.interpreter.ILoop.colonCommand(ILoop.scala:688)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:679)
at scala.tools.nsc.interpreter.ILoop.loadFiles(ILoop.scala:835)
at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:111)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
at org.apache.spark.repl.Main$.doMain(Main.scala:76)
at org.apache.spark.repl.Main$.main(Main.scala:56)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
[...]
at org.apache.derby.impl.sql.execute.InsertResultSet.getNextRowCore(Unknown Source)
at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown Source)
at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
... 154 more
(Jezz those error stacks are long)
I surmise the most relevant lines are:
2018-03-25 01:13:04 WARN Persist:96 - Insert of object
"org.apache.hadoop.hive.metastore.model.MTable@5e251945"
using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,
VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,
VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES
(?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED'
cannot accept a NULL value.
(Line breaks inserted by me)
Corruption of the entire hive setup:
$ clear ; hive -e "use xxx; show tables;"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/xxx/bin/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/xxx/bin/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/home/xxx/bin/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
The data and some paths were sanitized.
In order to restore order I delete the metastore_db
and derby.log
files then: schematool -initSchema -dbType derby
.
I began fiddling with this Spark + Hive configuration yesterday and any sort of workarounds are welcome.
Thanks in advance!
Upvotes: 0
Views: 1659
Reputation: 470
I ran into the same problem with spark 2.3.
Using the latest Hive 3.0 can fix this bug, Hive 3.0 was released on May 2018 with a SQL fix HIVE-18046 especially for this bug. Note that this fix is only released with Hive 3.0 and another SQL update package as you can see from here. Hive version released before May 2018 won't contain this fix.
If you don't what to use the latest Hive, you may need to manually execute the SQL which can fix this bug.
Upvotes: 1
Reputation: 322
After looking for more alternatives I got:
which states: The latest released version of Hive as of now is 2.1.x and it does not support Spark 2.x
.
So I reverted to the following versions:
apache-hive-1.2.2-bin.tar.gz
hadoop-2.7.5.tar.gz
spark-2.3.0-bin-hadoop2.7.tgz
and it now works as expected.
Previously I had:
apache-hive-2.3.2-bin.tar.gz
hadoop-3.0.0.tar.gz
spark-2.3.0-bin-hadoop2.7.tgz
which are the latest available as of the time of this post.
Best of luck!
Upvotes: 1