user2918922
user2918922

Reputation: 23

Getting RDBMS updates into HDFS using Sqoop

I am trying to write a sqoop job to achive below requirement.

  1. I have a table XYZ in which daily there might be some 1mil new records get created and 0.5 mil updates.
  2. I will have a End of day Sqoop job which should get the delta data from XYZ to HDFS and also get the updated records and sync it with HDFS.

I am comfortable implementing point 1, but cannot find a feasible solution for point 2.

Please help !!!!

Thanks, Raghu

Upvotes: 1

Views: 527

Answers (1)

Bector
Bector

Reputation: 1334

For this particular scenario you can do incremental sqoop where you required
lastmodified –check-column last_modified_col –last-value “2014-10-03 15:29:48.66″

please refer below example for sample query

sqoop job –create incr1 — import –connect jdbc:mysql://192.168.199.137/testdb123 –username testdb123 –password testdb123 –table Paper_STAGE –incremental lastmodified –check-column last_modified_col –last-value “2014-10-03 15:29:48.66″ –split-by id –hive-table paper_stage –hive-import

Hive and HDFS are optional, you can choose any one of them wherever you want to bring the data.

Upvotes: 3

Related Questions