Suraj Nayak
Suraj Nayak

Reputation: 907

hive-drop-import-delims not removing newline while using HCatalog in Sqoop

Sqoop while used with HCatalog import not able to remove new line (\n) from column data even after using --hive-drop-import-delims option in the command when running Apache Sqoop with Oracle.

Sqoop Query:

    sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

Data in Oracle Column col4 as below: (Data has control characters such as ^M)

<li>Details:^M
    <ul>^M
        <li>

Does Control character causing this problem?

Am I missing anything ? Is there any workaround or solution for this problem?

Upvotes: 5

Views: 15178

Answers (3)

Lynn Han
Lynn Han

Reputation: 503

From the official website: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Hive will have problems using Sqoop-imported data if your database’s rows contain string fields that have Hive’s default row delimiters (\n and \r characters) or column delimiters (\01 characters) present in them. You can use the --hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data. Alternatively, you can use the --hive-delims-replacement option to replace those characters with a user-defined string on import to give Hive-compatible text data. These options should only be used if you use Hive’s default delimiters and should not be used if different delimiters are specified.

Upvotes: 0

bunty
bunty

Reputation: 234

sqoop import \
--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username 123 \
--password 123 \
--table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 \
--hive-delims-replacement "anything" \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

You can try this --hive-delims-replacement "anything" this will replace all \n , \t , and \01 characters with the string you provided(in this case replace with string "anything").

Upvotes: 5

Suraj Nayak
Suraj Nayak

Reputation: 907

Use --map-column-java option to explicitly state the column is of type String. Then --hive-drop-import-delims works as expected (to remove \n from data).

Changed Sqoop Command :

sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string "" --map-column-java col4=String

Upvotes: 10

Related Questions