Reputation: 150
I'm new to this technology. We receive the file which may contain backdated records. From which we have to load the data into the hive table which will be insert only(partitioned on trans_Date). I need to know what should be the mechanism to insert the records in the table, where trans_date is backdated. Trans_date(column) is the transaction date and Record_date(column) is the date on which the record is inserted in the table.
Upvotes: 2
Views: 292
Reputation: 38325
You can do it in a number of ways and using different tools actually.
Create increment table on top of new files directory, or use LOAD
command to put files into increment table, or use hadoop fs -cp
command for the same.
Well, you have a table with incremental data now.
Next step is to load into main table into proper partitions if you do insert only, no updates, use
INSERT INTO TABLE PARTITION(trans_date)
select col1, col2, trans_date from incr_table; --filter if necessary
Drop incr_table or remove only data in table increment location and re-use the table. Or partition incr_table by record_date (or file_date) if applicable and never drop, load and select new partition.
If you need to update old records with incremental data, see this answer: https://stackoverflow.com/a/37744071/2700344
Upvotes: 1