Reputation: 291
i have a question on sqoop import utility. I understand we can run a "sqoop import" and get the data from an RDBMS (SQL Server in my case) and directly put it in a hive table (will be created dynamically).
My question is how to create partitions in this hive table if i have to, with the "sqoop import" utility (is it possible?).
After "sqoop import to Hive" is done, i always see a Hive table which is not partitioned. My requirement is to have a partitioned tables on columns x,y,z..
Thanks, Sid
Upvotes: 1
Views: 3056
Reputation: 4674
you can import data directly to hive table and can create partition table and load it directly using sqoop. Please find below code:
sqoop import \
--connect "jdbc:sqlserver://yourservername:1433;databases=EMP" \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--username youruserid \
--password yourpassword \
--fields-terminated-by '|' \
--as-textfile \
--delete-target-dir \
--target-dir 'hdfspathlocation' \
--hive-import \
--hive-overwrite \
--hive-table UDB.EMPLOYEE_PARTITION_TABLE \
--hive-partition-key EMPLOYEE_CITY \
--hive-partition-value 'NOIDA' \
--num-mappers 1 \
--query "select TEST_EMP_ID,TEST_EMP_NAME,TEST_EMP_DEPARTMENT,TEST_EMP_SALARY,TEST_EMP_CITY FROM EMP.dbo.TEST_EMP_TABLE where TEST_EMP_CITY = 'NOIDA' AND \$CONDITIONS";
As you can see that this sqoop import will create a partitioned table UDB.EMPLOYEE_PARTITION_TABLE in hive and create a partitioned column as EMPLOYEE_CITY.
this will create a managed table in hive with data in text format. below is the schema of hive table:
+--------------------------+-----------------------+-----------------------+--+
| col_name | data_type | comment |
+--------------------------+-----------------------+-----------------------+--+
| test_emp_id | int | |
| test_emp_name | string | |
| test_emp_department | string | |
| test_emp_salary | int | |
| test_emp_city | string | |
| employee_city | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| employee_city | string | |
+--------------------------+-----------------------+-----------------------+--+
0 2018-11-30 00:01 /hdfspathlocation/udb.db/employee_partition_table/employee_city=NOIDA
You need to make sure few things. your hive-partition-key column name should not be part of your database table when you are using hive-import. else you will get below error.
Imported Failed: Partition key TEST_EMP_CITY cannot be a column to import.
keep your partition column at the end of your select statement while specifying the query in sqoop import.
select TEST_EMP_ID,TEST_EMP_NAME,TEST_EMP_DEPARTMENT,TEST_EMP_SALARY,TEST_EMP_CITY FROM EMP.dbo.TEST_EMP_TABLE where TEST_EMP_CITY = 'NOIDA' AND \$CONDITIONS
Let me know if this works for you.
Upvotes: 4