Reputation: 410
Is there a way to use sqoop to do incremental exports ? I am using Hcatalog integration for sqoop.I tried using the --last-value, --check-column options which are used for incremental import, but sqoop gave me error that the options were invalid.
Upvotes: 1
Views: 3298
Reputation: 522
I have not seen incremental sqoop export arguments. The other way you could try is to create a contol_table in hive where you keep log of the table name & timestamp when it was last exported every time.
create table if not exists control_table (
table_name string,
export_date timestamp
);
insert into control_table 'export_table1' as table_name, from_unixtime(unix_timestamp()) as export_date from control_table;
If export_table1 is the table you want to export incrementally and assuming if have already executed above two statements.
--execute below at once
--get the timestamp when the table was last executed
create temporary table control_table_now as select table_name, max(export_date) as last_export_date from control_table group by table_name;
--get incremental rows
create table new_export_table1 as select field1, field2, field3, .... timestamp1 from export_table1 e, control_table_now c where c.table_name = 'export_table1' and e.timestamp1 >= c.last_export_date;
--append the control_table for next process
insert into control_table 'export_table1' as table_name, from_unixtime(unix_timestamp()) as export_date from control_table;
Now, export the new_export_table1 table which is incrementally created using sqoop export command.
Upvotes: 2
Reputation: 742
By default sqoop does not support incremental update with hcatalog integration, when we try it gives following error
Append mode for imports is not compatible with HCatalog. Please remove the parameter--append-mode at org.apache.sqoop.tool.BaseSqoopTool.validateHCatalogOptions(BaseSqoopTool.java:1561)
you can use query option to make it work. as described in this hortonworks document
Upvotes: 0