Devender Prakash
Devender Prakash

Reputation: 91

How can we automate incremental import in SQOOP?

How can we automate the incremental import in SQoop ?

In incremental import, we need to give the --last-value to start the import from the last value onwards, but my job is to frequently import from RDBMS, I don't want to give last value manually, is there any way we can automate this process?

Upvotes: 4

Views: 9969

Answers (4)

user3123372
user3123372

Reputation: 744

This can be achieved easily with a sqoop job

1. Create a sqoop job(There is a space before "import")

sqoop job     --create JobName6 \
           -- import  \
                --connect jdbc:mysql://localhost:3306/retail_db \
                --username=username \
                --password-file /user/sqoop/password \
                --table departments \
                --target-dir /user/hive/warehouse/test.db/departments \
                --table departments \
                --split-by department_id \
                --check-column department_id \
                --incremental append \
               --last-value 0;

2. Run the sqoop job sqoop job --exec JobName6; check the values in the location in HDFS

3. Insert some data in source table (mysql) INSERT INTO departments VALUES (9,'New Data1 '),(10,'New Data2');

2. Run the sqoop job again . sqoop job --exec JobName6; check the values in the location in HDFS again .

Similarly for Hive Import

sqoop job     --create JobName1 \
           -- import  \
                --connect jdbc:mysql://localhost:3306/retail_db \
                --username=username\
                --password-file /user/sqoop/password \
                --table departments \
                --hive-import \
                --hive-table department \
                --split-by department_id \
                --check-column department_id \
                --incremental append \
               --last-value 0; 

Upvotes: 2

Vamshi
Vamshi

Reputation: 31

You can take advantage of the built-in Sqoop metastore

You can create a simple incremental import job with the following command:

sqoop job \ --create <> \ --\ import \ --connect <> \ --username <> \ --password <> \ --table <> \ --incremental append \ --check-<> \ --last-value 0

And start it with the --exec parameter:

sqoop job --exec <<Job Name>>

Sqoop will automatically serialize the last imported value back into the metastore after each successful incremental job

Upvotes: 3

K S Nidhin
K S Nidhin

Reputation: 2650

An alternate approach to @Durga Viswanath Gadiraju answer.

In case you are importing the data to a hive table , you could query the last updated value from the hive table and pass the value to the sqoop import query. You could use shell script or oozie actions for achieving this.

Shell script :

lastupdatedvalue=`hive -e 'select last_value from table` #tweak the selection query based on the logic.

sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${lastupdatedvalue}

Oozie approach :

  1. Hive action for the select query based on the logic to retrieve the last updated value .
  2. Sqoop action for incremental load from the captured output of previous hive action.

PFB a sudo workflow :

<workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
<start to="hiveact"/>
<action name="hiveact">
    <hive xmlns="uri:oozie:hive-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${queueName}</value>
            </property>
        </configuration>
        <script>script.sql</script>
<capture-output/>
    </hive>    
    <ok to="sqoopact"/>
    <error to="kill"/>

<action name="sqoopact">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <command>import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${wf:actionData('hiveact')}</command>
     </sqoop>
    <ok to="end"/>
    <error to="kill"/>
</action>
<kill name="kill">
    <message>Action failed</message>
</kill>
<end name="end"/>

Hope this helps.

Upvotes: 6

Durga Viswanath Gadiraju
Durga Viswanath Gadiraju

Reputation: 3956

One way to get it:

Create log table in database and develop incremental import as follows

Query the log table using sqoop eval command with the last value from last run
Run the sqoop import
Update the log table with the latest valueusing sqoop eval command

You need to automate the process of sqoop eval, sqoop import and sqoop eval. You can submit any valid query to any database you have connectivity using sqoop eval. Hence you can run select query to before the import to get the last value from last run and run update query to update the log table with last value from the current run.

Upvotes: 0

Related Questions