Max
Max

Reputation: 1459

Unable to use MySQL as Hive Metastore for Spark

I want to set up my local Spark to enable multiple connections. (i.e. notebook, BI tool, application, and etc) So I have to get away from Derby.

My hive-site.xml is as follows

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost:3306/hive_metastore?createDatabaseIfNotExist=true</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.cj.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>spark@localhost</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>spark</value>
</property>

<property>
  <name>datanucleus.schema.autoCreateTables</name>
  <value>true</value>
</property>

I set "datanucleus.schema.autoCreateTables" to true as suggested by Spark. "createDatabaseIfNotExist=true" does not seem to do anything.

But that still fails with

21/12/26 04:34:20 WARN Datastore: SQL Warning : 'BINARY as attribute of a type' is deprecated and will be removed in a future release. Please use a CHARACTER SET clause with _bin collation instead
21/12/26 04:34:20 ERROR Datastore: Error thrown executing CREATE TABLE `TBLS`
(
    `TBL_ID` BIGINT NOT NULL,
    `CREATE_TIME` INTEGER NOT NULL,
    `DB_ID` BIGINT NULL,
    `LAST_ACCESS_TIME` INTEGER NOT NULL,
    `OWNER` VARCHAR(767) BINARY NULL,
    `RETENTION` INTEGER NOT NULL,
    `IS_REWRITE_ENABLED` BIT NOT NULL,
    `SD_ID` BIGINT NULL,
    `TBL_NAME` VARCHAR(256) BINARY NULL,
    `TBL_TYPE` VARCHAR(128) BINARY NULL,
    `VIEW_EXPANDED_TEXT` TEXT [CHARACTER SET charset_name] [COLLATE collation_name] NULL,
    `VIEW_ORIGINAL_TEXT` TEXT [CHARACTER SET charset_name] [COLLATE collation_name] NULL,
    CONSTRAINT `TBLS_PK` PRIMARY KEY (`TBL_ID`)
) ENGINE=INNODB : You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[CHARACTER SET charset_name] [COLLATE collation_name] NULL,

and such.

Please advice.

Upvotes: 2

Views: 494

Answers (2)

Triamus
Triamus

Reputation: 2505

Alternatively, you could run Hive provided scripts directly on the DB. Different backends are at GitHub/Apache Hive: Metastore Scripts.

Upvotes: 0

Max
Max

Reputation: 1459

Ok I did it.

So basically I can't rely on Spark to do this automatically, even though it was able to initialize the Derby version.

So I had to download both Hadoop and Hive, and use the schemaTool bundled within Hive to set up the metastore.

Then Spark is able to use that directly.

Upvotes: 1

Related Questions