Reputation: 1
Is it possible to have incremental load by Sqoop for teradata on the basic of load time whose datatype is timestamp(6)
? How it handles the format while loading in hdfs:
sql exxeption:- expecting something '(' and ').
Upvotes: 0
Views: 102
Reputation: 1410
So you have one column which is loadtime [type timestamp(6)].
While pulling the data to HDFS you have to create a 3 step process. Also have to create partition in hdfs end.
I prefer this way because you don’t to mess with older data.
But you could use --incremental
option in sqoop.
0000-00-00 00:00:00.000000
WHERE loadtime > ${max_of_loadtime_in_hdfs}
. You will be passing max_of_loadtime_in_hdfs
as an argument. Now timestamp format would be different for teradata and hive.
It's better to cast to string while saving it to hive.
So while checking the loadtime with your hive loadtime(string type) you have to cast back to timestamp(6).
So in your condition in sqoop end will be (which ever works for you):
WHERE loadtime > CAST(${max_of_loadtime_in_hdfs} AS TIMESTAMP(6))
WHERE loadtime > TO_TIMESTAMP(${max_of_loadtime_in_hdfs}, 'yyyy-mm-dd hh24:mi:ss.ff6')
max_of_loadtime_in_hdfs will be something like this
2020-12-31 23:59:59.999999
So in your final sqoop query:
SELECT
column_A,
column_B,
...
column_N,
to_char(loadtime, 'yyyy-mm-dd hh24:mi:ss.ff6') loadtime
FROM teradata_table
WHERE loadtime > TO_TIMESTAMP(${max_of_loadtime_in_hdfs}, 'yyyy-mm-dd hh24:mi:ss.ff6')
AND $CONDITIONS --this is for sqoop don’t remove this
hive table definition:
CREATE EXTERNAL TABLE teratable_in_hive (
column_A <column_A_datatype>,
column_B <column_B_datatype>,
...
column_N <column_N_datatype>,
loadtime string
)
PARTITIONED BY (pull_seq_no <type can be int or string>)
STORED AS <file_format TEXTFILE / PARQUET>
LOCATION 'hdfs:///<some-path>/teratable_in_hive'
Upvotes: 0