fmv1992
fmv1992

Reputation: 322

Insertion of Spark DataFrame into Hive table causes Hive setup corruption

Summary: Trying to insert a Spark DataFrame into a hive table causes loop of error and database corruption.

Details:

  1. Loop of error:

    df.show(5)
    df.write.mode(SaveMode.Overwrite).saveAsTable("dbnamexxx.tablenamexxx")
    

    Yields:

    +---+---+------+---+-------+-------------------+---------------+-------------+--------+
    | zz|zzz|zzzzzz| zz|zzzzzzz|         zzzzz_zzzz|zzzzzzzzzz_zzzz|zz_zzzzzzzzzz|zz_zzzzz|
    +---+---+------+---+-------+-------------------+---------------+-------------+--------+
    |833| 13|     1| 19|    477|2017-11-00 00000000|           null|            0|      29|
    |833|  3|     1| 13|    280|2017-11-00 00000000|           null|            0|      29|
    |833|  9|     1| 13|    442|2017-11-00 00000000|           null|            0|      29|
    |833|  3|     1| 19|    173|2017-11-00 00000000|           null|            0|      29|
    |833| 14|     1| 17|    360|2017-11-00 00000000|           null|            0|      29|
    +---+---+------+---+-------+-------------------+---------------+-------------+--------+
    

    (Included just to show that the table is ok)

    Then the error (which repeats itself every ~2 seconds):

    [Stage 5:===>                                                    (13 + 4) / 200]2018-03-25 01:12:53 WARN  DFSClient:611 - Caught exception 
    java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        at java.lang.Thread.join(Thread.java:1326)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
    2018-03-25 01:13:04 WARN  Persist:96 - Insert of object "org.apache.hadoop.hive.metastore.model.MTable@5e251945" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED'  cannot accept a NULL value.
    2018-03-25 01:13:04 ERROR RetryingHMSHandler:173 - Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDODataStoreException: Insert of object "org.apache.hadoop.hive.metastore.model.MTable@5e251945" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED'  cannot accept a NULL value.
        at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
        at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
        at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
        at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:814)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
        at com.sun.proxy.$Proxy15.createTable(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1416)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1449)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at com.sun.proxy.$Proxy17.create_table_with_environment_context(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2050)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:97)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:669)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
        at com.sun.proxy.$Proxy18.createTable(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:468)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:466)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:466)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255)
        at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:466)
        at org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:479)
        at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$createDataSourceTable(HiveExternalCatalog.scala:367)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:243)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
        at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:119)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:304)
        at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:184)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
        at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:458)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:437)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:35)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:37)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:39)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:41)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:43)
        at $line29.$read$$iw$$iw$$iw$$iw.<init>(<console>:45)
        at $line29.$read$$iw$$iw$$iw.<init>(<console>:47)
        at $line29.$read$$iw$$iw.<init>(<console>:49)
        at $line29.$read$$iw.<init>(<console>:51)
        at $line29.$read.<init>(<console>:53)
        at $line29.$read$.<init>(<console>:57)
        at $line29.$read$.<clinit>(<console>)
        at $line29.$eval$.$print$lzycompute(<console>:7)
        at $line29.$eval$.$print(<console>:6)
        at $line29.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
        at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
        at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5$$anonfun$apply$6.apply(ILoop.scala:427)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5$$anonfun$apply$6.apply(ILoop.scala:423)
        at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:111)
        at scala.reflect.io.File.applyReader(File.scala:50)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5.apply(ILoop.scala:423)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5.apply(ILoop.scala:423)
        at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:91)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:422)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:422)
        at scala.tools.nsc.interpreter.ILoop.savingReader(ILoop.scala:96)
        at scala.tools.nsc.interpreter.ILoop.interpretAllFrom(ILoop.scala:421)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$run$3$1.apply(ILoop.scala:577)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$run$3$1.apply(ILoop.scala:576)
        at scala.tools.nsc.interpreter.ILoop.withFile(ILoop.scala:570)
        at scala.tools.nsc.interpreter.ILoop.run$3(ILoop.scala:576)
        at scala.tools.nsc.interpreter.ILoop.loadCommand(ILoop.scala:583)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$8.apply(ILoop.scala:207)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$8.apply(ILoop.scala:207)
        at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:62)
        at scala.tools.nsc.interpreter.ILoop.colonCommand(ILoop.scala:688)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:679)
        at scala.tools.nsc.interpreter.ILoop.loadFiles(ILoop.scala:835)
        at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:111)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
        at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
        at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
        at org.apache.spark.repl.Main$.doMain(Main.scala:76)
        at org.apache.spark.repl.Main$.main(Main.scala:56)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    
        [...]
    
        at org.apache.derby.impl.sql.execute.InsertResultSet.getNextRowCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown Source)
        at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
        at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
        ... 154 more
    

    (Jezz those error stacks are long)

    I surmise the most relevant lines are:

        2018-03-25 01:13:04 WARN  Persist:96 - Insert of object
        "org.apache.hadoop.hive.metastore.model.MTable@5e251945"
        using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,
        VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,
        VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES
        (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED'
        cannot accept a NULL value.
    

    (Line breaks inserted by me)

  2. Corruption of the entire hive setup:

    $ clear ; hive -e "use xxx; show tables;"                 
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/xxx/bin/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/xxx/bin/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/home/xxx/bin/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
    FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    

The data and some paths were sanitized.

In order to restore order I delete the metastore_db and derby.log files then: schematool -initSchema -dbType derby.

I began fiddling with this Spark + Hive configuration yesterday and any sort of workarounds are welcome.

Thanks in advance!

Upvotes: 0

Views: 1659

Answers (2)

Heyang Wang
Heyang Wang

Reputation: 470

I ran into the same problem with spark 2.3.

Using the latest Hive 3.0 can fix this bug, Hive 3.0 was released on May 2018 with a SQL fix HIVE-18046 especially for this bug. Note that this fix is only released with Hive 3.0 and another SQL update package as you can see from here. Hive version released before May 2018 won't contain this fix.

If you don't what to use the latest Hive, you may need to manually execute the SQL which can fix this bug.

Upvotes: 1

fmv1992
fmv1992

Reputation: 322

After looking for more alternatives I got:

Cloudera Hive on Spark 2.x?

which states: The latest released version of Hive as of now is 2.1.x and it does not support Spark 2.x.

So I reverted to the following versions:

  • apache-hive-1.2.2-bin.tar.gz

  • hadoop-2.7.5.tar.gz

  • spark-2.3.0-bin-hadoop2.7.tgz

and it now works as expected.

Previously I had:

  • apache-hive-2.3.2-bin.tar.gz

  • hadoop-3.0.0.tar.gz

  • spark-2.3.0-bin-hadoop2.7.tgz

which are the latest available as of the time of this post.

Best of luck!

Upvotes: 1

Related Questions