Reputation: 1459
I want to set up my local Spark to enable multiple connections. (i.e. notebook, BI tool, application, and etc) So I have to get away from Derby.
My hive-site.xml is as follows
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>spark@localhost</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>spark</value>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>true</value>
</property>
I set "datanucleus.schema.autoCreateTables" to true as suggested by Spark. "createDatabaseIfNotExist=true" does not seem to do anything.
But that still fails with
21/12/26 04:34:20 WARN Datastore: SQL Warning : 'BINARY as attribute of a type' is deprecated and will be removed in a future release. Please use a CHARACTER SET clause with _bin collation instead
21/12/26 04:34:20 ERROR Datastore: Error thrown executing CREATE TABLE `TBLS`
(
`TBL_ID` BIGINT NOT NULL,
`CREATE_TIME` INTEGER NOT NULL,
`DB_ID` BIGINT NULL,
`LAST_ACCESS_TIME` INTEGER NOT NULL,
`OWNER` VARCHAR(767) BINARY NULL,
`RETENTION` INTEGER NOT NULL,
`IS_REWRITE_ENABLED` BIT NOT NULL,
`SD_ID` BIGINT NULL,
`TBL_NAME` VARCHAR(256) BINARY NULL,
`TBL_TYPE` VARCHAR(128) BINARY NULL,
`VIEW_EXPANDED_TEXT` TEXT [CHARACTER SET charset_name] [COLLATE collation_name] NULL,
`VIEW_ORIGINAL_TEXT` TEXT [CHARACTER SET charset_name] [COLLATE collation_name] NULL,
CONSTRAINT `TBLS_PK` PRIMARY KEY (`TBL_ID`)
) ENGINE=INNODB : You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[CHARACTER SET charset_name] [COLLATE collation_name] NULL,
and such.
Please advice.
Upvotes: 2
Views: 494
Reputation: 2505
Alternatively, you could run Hive provided scripts directly on the DB. Different backends are at GitHub/Apache Hive: Metastore Scripts.
Upvotes: 0
Reputation: 1459
Ok I did it.
So basically I can't rely on Spark to do this automatically, even though it was able to initialize the Derby version.
So I had to download both Hadoop and Hive, and use the schemaTool bundled within Hive to set up the metastore.
Then Spark is able to use that directly.
Upvotes: 1