Reputation: 89
I am exporting the data from hive table to SQLServer table using sqoop, but at times all the records do not get exported.
I'm not sure how to identify this failure without manual intervention. I have a plan to solve this issue by writing a shell script that -
1) Stores the count of hive table in a variable. 2) Stores the count of SQL Server table in another variable. 3) Compares the value of the two variables,
if var1 != var 2
{
Delete the rows in SQL Server Server;
Rerun my sqoop script
}
else
exit;
Questions: With my current approach, How to store the count of rows of a hive table in a variable in a shell script? How to store the count of rows of a table in SQL-Server-Management studio in a shell script?
Upvotes: 1
Views: 1262
Reputation: 726
Basically your are trying to do the reconciliation after exporting the data from hive to sqlserver. To get the counts from sqlserver you can write java/scala jdbc program.
Below are the pseudo code
hiveCount = `hive -e "select count(*) from foo_table"`
sqlServerCount= java/scala jdbcCount.jar
Upvotes: 0
Reputation: 958
Sqoop provides an out of the box validation tool to ensure the correctness of import/export performed. You can do this using following flag:
--validate
Please refer https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#validation which details out all 3 validation interfaces.
As per which,
Validator - Drives the validation logic by delegating the decision to ValidationThreshold and delegating failure handling to ValidationFailureHandler. The default implementation is RowCountValidator which validates the row counts from source and the target.
You can further take actions on the validation output by implementing your own Validator as below:
The validation framework is extensible and pluggable. It comes with default implementations but the interfaces can be extended to allow custom implementations by passing them as part of the command line arguments as described below.
Upvotes: 1