How to identify Sqoop failures and handle them?

Question

I am exporting the data from hive table to SQLServer table using sqoop, but at times all the records do not get exported.

I'm not sure how to identify this failure without manual intervention. I have a plan to solve this issue by writing a shell script that -

1) Stores the count of hive table in a variable. 2) Stores the count of SQL Server table in another variable. 3) Compares the value of the two variables,

    if var1 != var 2 
    { 
       Delete the rows in SQL Server Server;
        Rerun my sqoop script
    }
    else
       exit;

Questions: With my current approach, How to store the count of rows of a hive table in a variable in a shell script? How to store the count of rows of a table in SQL-Server-Management studio in a shell script?

shriyog · Accepted Answer

Sqoop provides an out of the box validation tool to ensure the correctness of import/export performed. You can do this using following flag:

--validate

Please refer https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#validation which details out all 3 validation interfaces.

As per which,

Validator - Drives the validation logic by delegating the decision to ValidationThreshold and delegating failure handling to ValidationFailureHandler. The default implementation is RowCountValidator which validates the row counts from source and the target.

You can further take actions on the validation output by implementing your own Validator as below:

The validation framework is extensible and pluggable. It comes with default implementations but the interfaces can be extended to allow custom implementations by passing them as part of the command line arguments as described below.

How to identify Sqoop failures and handle them?

Answers (2)

Related Questions