Causality
Causality

Reputation: 1123

How to convert mysql DDL into hive DDL

Given a SQL script containing DDL for creating tables in MySQL database, I would like to convert the script into Hive DDL, so that I can create tables into hive. I could have written an interpreter myself, but thought there might be details I could miss (e.g. data format conversion, int, bigint, time, date, etc.) since I am very new to hive DDL.

I have seen this thread How to transfer mysql table to hive?, which mentioned sqoop http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html. However, from what I see, sqoop certainly translate the DDL, but only as an intermediate step (thus the translated DDL is no where to be found). Am I missing the command that would output the translation with the MySQL DDL as an input?

For example, my MySQL DDL look like:

CREATE TABLE `user_keyword` (
  `username` varchar(32) NOT NULL DEFAULT '',
  `keyword_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`username`,`keyword_id`),
  KEY `keyword_id` (`keyword_id`),
  CONSTRAINT `analyst_keywords_ibfk_1` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`keyword_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

And the output Hive DDL would be like:

CREATE TABLE user_keyword (
  username string,
  keyword_id int,
);

Upvotes: 1

Views: 6284

Answers (2)

Tariq
Tariq

Reputation: 34184

Alternatively, one could use the create-hive-table tool to do that. The create-hive-table tool populates a Hive metastore with a definition for a table based on a database table previously imported to HDFS, or one planned to be imported. This effectively performs the --hive-import step of sqoop-import without running the preceeding import. For example,

sqoop create-hive-table --connect jdbc:mysql://localhost/demo -username root --table t2 --fields-terminated-by ',' --hive-table t2

This command will create a blank hive table t2 based on the schema of the same table in MySQL without importing the data.

Upvotes: 2

Charles Menguy
Charles Menguy

Reputation: 41458

I actually thought this was not supported, but after looking at the Source here is what I saw in HiveImport.java:

/**
 * @return true if we're just generating the DDL for the import, but
 * not actually running it (i.e., --generate-only mode). If so, don't
 * do any side-effecting actions in Hive.
 */
private boolean isGenerateOnly() {
  return generateOnly;
}

/**
 * @return a File object that can be used to write the DDL statement.
 * If we're in gen-only mode, this should be a file in the outdir, named
 * after the Hive table we're creating. If we're in import mode, this should
 * be a one-off temporary file.
 */
private File getScriptFile(String outputTableName) throws IOException {
  if (!isGenerateOnly()) {
    return File.createTempFile("hive-script-", ".txt",
        new File(options.getTempDir()));
  } else {
    return new File(new File(options.getCodeOutputDir()),
        outputTableName + ".q");
  }
}

So basically you should be able to do only the DDL generation using the option --generate-only used in cunjunction with --outdir and your table will be create in the output dir specified and named after your table.

For example based on the link you provided:

sqoop import --verbose --fields-terminated-by ',' --connect jdbc:mysql://localhost/test --table employee --hive-import --warehouse-dir /user/hive/warehouse --fields-terminated-by ',' --split-by id --hive-table employee --outdir /tmp/mysql_to_hive/ddl --generate-only

will create /tmp/mysql_to_hive/ddl/employee.q

Upvotes: 2

Related Questions