ayushman
ayushman

Reputation: 1

Sqoop incremental

Is it possible to have incremental load by Sqoop for teradata on the basic of load time whose datatype is timestamp(6)? How it handles the format while loading in hdfs:

sql exxeption:- expecting something '(' and ').

Upvotes: 0

Views: 102

Answers (1)

SnigJi
SnigJi

Reputation: 1410

So you have one column which is loadtime [type timestamp(6)].
While pulling the data to HDFS you have to create a 3 step process. Also have to create partition in hdfs end.
I prefer this way because you don’t to mess with older data.
But you could use --incremental option in sqoop.

  1. Fetch the max of loadtime present in the hdfs. If that is 1st time pass a default value something like 0000-00-00 00:00:00.000000
  2. While pulling data via sqoop add one condition pass that WHERE loadtime > ${max_of_loadtime_in_hdfs}. You will be passing max_of_loadtime_in_hdfs as an argument.
  3. Save the data in a newer partition

Now timestamp format would be different for teradata and hive.
It's better to cast to string while saving it to hive.

So while checking the loadtime with your hive loadtime(string type) you have to cast back to timestamp(6).
So in your condition in sqoop end will be (which ever works for you):

  • WHERE loadtime > CAST(${max_of_loadtime_in_hdfs} AS TIMESTAMP(6))
  • WHERE loadtime > TO_TIMESTAMP(${max_of_loadtime_in_hdfs}, 'yyyy-mm-dd hh24:mi:ss.ff6')

max_of_loadtime_in_hdfs will be something like this 2020-12-31 23:59:59.999999

So in your final sqoop query:

SELECT
    column_A,
    column_B,
    ...
    column_N,
    to_char(loadtime, 'yyyy-mm-dd hh24:mi:ss.ff6') loadtime
FROM teradata_table
WHERE loadtime > TO_TIMESTAMP(${max_of_loadtime_in_hdfs}, 'yyyy-mm-dd hh24:mi:ss.ff6') 
AND $CONDITIONS --this is for sqoop don’t remove this

hive table definition:

CREATE EXTERNAL TABLE teratable_in_hive (
    column_A <column_A_datatype>,
    column_B <column_B_datatype>,
    ...
    column_N <column_N_datatype>,
    loadtime string
)
PARTITIONED BY (pull_seq_no <type can be int or string>)
STORED AS <file_format TEXTFILE / PARQUET>
LOCATION 'hdfs:///<some-path>/teratable_in_hive'

Upvotes: 0

Related Questions