Unable to use MySQL as Hive Metastore for Spark

Question

I want to set up my local Spark to enable multiple connections. (i.e. notebook, BI tool, application, and etc) So I have to get away from Derby.

My hive-site.xml is as follows


  javax.jdo.option.ConnectionURL
  jdbc:mysql://localhost:3306/hive_metastore?createDatabaseIfNotExist=true



  javax.jdo.option.ConnectionDriverName
  com.mysql.cj.jdbc.Driver



  javax.jdo.option.ConnectionUserName
  spark@localhost



  javax.jdo.option.ConnectionPassword
  spark



  datanucleus.schema.autoCreateTables
  true

I set "datanucleus.schema.autoCreateTables" to true as suggested by Spark. "createDatabaseIfNotExist=true" does not seem to do anything.

But that still fails with

21/12/26 04:34:20 WARN Datastore: SQL Warning : 'BINARY as attribute of a type' is deprecated and will be removed in a future release. Please use a CHARACTER SET clause with _bin collation instead
21/12/26 04:34:20 ERROR Datastore: Error thrown executing CREATE TABLE `TBLS`
(
    `TBL_ID` BIGINT NOT NULL,
    `CREATE_TIME` INTEGER NOT NULL,
    `DB_ID` BIGINT NULL,
    `LAST_ACCESS_TIME` INTEGER NOT NULL,
    `OWNER` VARCHAR(767) BINARY NULL,
    `RETENTION` INTEGER NOT NULL,
    `IS_REWRITE_ENABLED` BIT NOT NULL,
    `SD_ID` BIGINT NULL,
    `TBL_NAME` VARCHAR(256) BINARY NULL,
    `TBL_TYPE` VARCHAR(128) BINARY NULL,
    `VIEW_EXPANDED_TEXT` TEXT [CHARACTER SET charset_name] [COLLATE collation_name] NULL,
    `VIEW_ORIGINAL_TEXT` TEXT [CHARACTER SET charset_name] [COLLATE collation_name] NULL,
    CONSTRAINT `TBLS_PK` PRIMARY KEY (`TBL_ID`)
) ENGINE=INNODB : You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[CHARACTER SET charset_name] [COLLATE collation_name] NULL,

and such.

Please advice.

Max · Accepted Answer

Ok I did it.

So basically I can't rely on Spark to do this automatically, even though it was able to initialize the Derby version.

So I had to download both Hadoop and Hive, and use the schemaTool bundled within Hive to set up the metastore.

Then Spark is able to use that directly.

Unable to use MySQL as Hive Metastore for Spark

Answers (2)

Related Questions