Reputation: 1810
I am doing a incremental sqooping from to hdfs oracle giving where condition like
(LST_UPD_TMST >TO_TIMESTAMP('2016-05-31T18:55Z', 'YYYY-MM-DD"T"HH24:MI"Z"')
AND LST_UPD_TMST <= TO_TIMESTAMP('2016-09-13T08:51Z', 'YYYY-MM-DD"T"HH24:MI"Z"'))
But it is not using the index. How can I force an Index so that sqoop can be faster by considering only filtered records.
What is the best option to do incremental sqoop. Table size in oracle is in TBs. Table has billions rows and after where condition it is in some million
Upvotes: 2
Views: 482
Reputation: 35424
You can use
--where
or--query
with where condition in select to filter import results
I was not sure about your sqoop full command, just give a try in this way
sqoop import
--connect jdbc:oracle:thin:@//db.example.com/dbname \
--username dbusername \
--password dbpassword \
--table tablename \
--columns "column,names,to,select,in,comma,separeted" \
--where "(LST_UPD_TMST >TO_TIMESTAMP('2016-05-31T18:55Z', 'YYYY-MM-DD\"T\"HH24:MI\"Z\"') AND LST_UPD_TMST <= TO_TIMESTAMP('2016-09-13T08:51Z', 'YYYY-MM-DD\"T\"HH24:MI\"Z\"'))" \
--target-dir {hdfs/location/to/save/data/from/oracle} \
--incremental lastmodified \
--check-column LST_UPD_TMST \
--last-value {from Date/Timestamp to Sqoop in incremental}
Check more details about sqoop incremental load
For incremental imports Sqoop saved job is recommended to maintain --last-value
automatically.
sqoop job --create {incremental job name} \
-- import
--connect jdbc:oracle:thin:@//db.example.com/dbname \
--username dbusername \
--password dbpassword \
--table tablename \
--columns "column,names,to,select,in,comma,separeted" \
--incremental lastmodified \
--check-column LST_UPD_TMST \
--last-value 0
Here
--last-value 0
to import from start for first time then latest value will be passed automatically in next invocation by sqoop job
Upvotes: 2