Reputation: 163

Difference between 2 commands in Sqoop

Please tell me what is the difference between the 2 commands below

sqoop import --connect jdbc:mysql://localhost:3306/db1 
 --username root --password password 
 --table tableName --hive-table tableName --create-hive-table --hive-import;

sqoop create-hive-table --connect jdbc:mysql://localhost:3306/db1 
 --username root --password password;

What is the difference of using --create-hive-table & just create-hive-table in both the commands?

Upvotes: 0

Answers (2)

New Contributer

Reputation: 509

Consider the two queries:

1) When --create-hive-table is used, the contents of the RDBMS table will be copied to the location mentioned by --target-dir (HDFS Location). This will check whether the table sqoop.emp exists in Hive or not. If the table in Hive doesn't exist, data from the HDFS location is moved to the hive table and everything goes well. In case, if the table (sqoop.emp) already exists in Hive, an error is thrown: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table emp already exists)

Example:

sqoop import \
--connect jdbc:mysql://jclient.ambari.org/sqoop \
--username hdfs -P \
--table employee \
--target-dir /user/hive/sqoop/employee \
--delete-target-dir \
--hive-import \
--hive-table sqoophive.emp \
--create-hive-table \
--fields-terminated-by ',' \
--num-mappers 3

2) When create-hive-table is used without hive-import The schema of the swoop.employee (in RDBMS) is fetched and using that a table is created under the default database in hive (default.employee). But no data is transferred.

Example (Modified form of one given in the book (Hadoop Definitive Guide by Tom White):

sqoop create-hive-table \
--connect jdbc:mysql://jclient.ambari.org/sqoop \
--username hdfs -P \
--table employee \
--fields-terminated-by ','

Now the question is, when to use what. Former is used when no data is only present in the RDBMS and we need to not only create but populate the table in Hive in one go. The latter is used when the table has to be created in the Hive but not to be populated. Or in case when the data already exists in HDFS and it is to be used to populate the hive table.

Upvotes: 1

Karthik

Reputation: 1811

sqoop-import --connect jdbc:mysql://localhost:3306/db1 
     >-username root -password password 
     >--table tableName --hive-table tableName --create-hive-table --hive-import;

The above command will import data from db into hive with hive default settings and if table is not already present it will create a table in Hive with same schema as it was in DB.

sqoop create-hive-table --connect jdbc:mysql://localhost:3306/db1 
      >-username root -password password;

The create-hive-table tool will create a table in Hive Metastore, with a definition for a table based on a database table previously imported to HDFS, or one planned to be imported(it will pick from sqoop job). This effectively performs the "--hive-import" step of sqoop-import without running the preceeding import.

For example consider you have imported table1 from db1 into hdfs using sqoop. If you execute create-hive-table next it will create a table in hive metastore with table schema from db1 of table1. So it will be usefull for you to load data into this table in future whenever needed.

Upvotes: 0

Difference between 2 commands in Sqoop

Answers (2)

Related Questions