Reputation: 33
I have written a Sqoop script:
HADOOP_USER_NAME=hdfs sqoop import --connect jdbc:mysql://cmsmaster.cy9mnipcdof2.us-east-1.rds.amazonaws.com/db --username user -password-file /user/password/dbpass.txt --fields-terminated-by ',' --target-dir /user/db/sqoop_internal --delete-target-dir --hive-import --hive-overwrite --hive-table sqoop_internal --query '
SOME_QUERY where $CONDITIONS' --split-by id
This copies the result of the query and moves it to a Hive table, overwriting its previous content.
Now what I need is to modify this script so that it doesn't overwrite the whole Hive table. Instead, it should overwrite a partition of that Hive table. How to do that?
Upvotes: 1
Views: 189
Reputation: 624
From your question i understand that you might need to do a sqoop merge.
You need to remove :
--delete-target-dir and --hive-overwrite
And add :
--incremental lastmodified --check-column modified --last-value '2018-03-08 00:00:00' --merge-key yourPrimaryKey
You can find more information from the official documentation. https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_merge_literal
Upvotes: 1