Reputation: 91

Sqoop export from hive to oracle with different col names, number of columns and order of columns

The scenario is like, I have a hive table with 10 columns . I want to export the data from my hive table to an oracle table using Sqoop.

But the target oracle table has 30 columns having different names than hive table columns. Also, the column positions in oracle table are not same as in hive table.

Can anyone please suggest how can I write the Sqoop export command for this case?

Upvotes: 3

Answers (3)

Dev

Reputation: 13753

First of all, you can't export data directly from hive to oracle.

You need to EXPORT hive table to HDFS

sample command:

export table mytable to 'some_hdfs_location'

Or use HDFS data location of your hive table.

command to check the location
```
show create table mytable
```

So now you have location of data for your Hive table.

You can use --columns tag in Sqoop Export command to choose column order and number.

There is no problem with different column name.

I am taking simple example

Now you have hive table with columns - c1, c2, c3

and Oracle table - col1, col2, col3, col4, col5

I want to map c1 with col2, c2 with col5, c3 with col1.

I will use --columns "col2,col5,col1" in my sqoop command.

As per Sqoop docs,

By default, all columns within a table are selected for export. You can select a subset of columns and control their ordering by using the --columns argument. This should include a comma-delimited list of columns to export. For example: --columns "col1,col2,col3". Note that columns that are not included in the --columns parameter need to have either defined default value or allow NULL values. Otherwise your database will reject the imported data which in turn will make Sqoop job fail.

Upvotes: 2

Arunakiran Nulu

Reputation: 2089

Try below , it is assumed that your hive table is created as a external table and your data is located at /myhivetable/data/ , fields are terminated by | and lines are terminated by '\n'.

In you RDBMS table , the 20 columns which are not going to be populated from hive HDFS should have default values or allow null values.

Let us suppose your database columns are DC1,DC2,D4,DC5 ....D20 and hive columns are c1,c2,c3,c3,......c10 and your mapping is as below.

DC1  -- c8
DC2  -- c1
DC3  -- c2
DC4  -- c4
DC5  -- c3
DC6  -- c7
DC7  -- c10
DC8  -- c9
DC9  -- c5
DC10 -- c6


 sqoop export \
  --connect jdbc:postgresql://10.10.11.11:1234/db \
  --table table1 \
  --username user \
  --password pwd \
  --export-dir /myhivetable/data/ \
  --columns "DC2,DC3,DC5,DC4,DC9,DC10,DC6,DC1,DC8,DC7" \
  --fields-terminated-by '|' \
  --lines-terminated-by '\n' \
  --input-null-string "\\\\N" \
  --input-null-non-string "\\\\N"

Upvotes: 2

KiranM

Reputation: 1323

There are 2 options:

As of now, sqoop export is very limited (thinking because this is not much intended functionality but other way around), it gives only option to specify the --export-dir which is the table's warehouse directory. And it loads all columns. So you may need to load into a staging table & load it into the original base table with relevant column mapping.
You can export the data from Hive using:
```
INSERT OVERWRITE DIRECTORY '/user/hive_exp/orders' select column1, column2 from hivetable;
```
Then use the Oracle's native import tool. This gives more flexibility.

Please update if you have a better solution.

Upvotes: 0

Sqoop export from hive to oracle with different col names, number of columns and order of columns

Answers (3)

Related Questions