windrider
windrider

Reputation: 1

How sqoop treats updated rows while import?

If there is a table in Oracle(or any RDBMS) which contains data that is flushed out every day.

example:

1234,Raj,Kolkata,1000,09092015

Suppose, I import this row today using a standard sqoop import and store in HDFS in flatfile. Next day, the row is deleted from the source table.But if the same record is updated(say the sal field 1000 is updated to 2000) after 7 days.

If I run again a sqoop query how will it treat the data and how will it store? Will there be two entries of the same record or the newer value will be updated?

will this record

<1234, Raj, Kolkata, 1000, 09092015>

be replaced by this one?

<1234, Raj, Kolkata, 2000, 09092015>

Upvotes: 0

Views: 966

Answers (1)

Jaime Caffarel
Jaime Caffarel

Reputation: 2469

If you perform incremental imports in Sqoop, you can control what happens when one of the rows is updated as well as what happens when new rows are inserted by means of using the argument --incremental. You have two options:

append (sqoop import (...) --incremental append) This option is used when new rows are continually added to your database and you want to import them. In this case, you'd need to tell Sqoop the column that it has to check (in order to detect these new rows), by means of the check-column parameter.

lastmodified (sqoop import (...) --incremental lastmodified). This option is what you want in your example, it lets you tell Sqoop that you want to check for updated rows in the table (that you already imported) and set them to the new values. You have to bear in mind that you have to specify, by means of the parameter --check-column, the column name which Sqoop will use to detect the updated rows, and also that this column is required to hold a date value (for instance, date, datetime, time or timestamp). In your example you would need an extra column holding a date value, and you should update that value every time you change the value of any of the other columns, in order for that row to be imported.

Of course, if you update a row but you don't update the field specified by check-column of that row, it will not be updated in your destination table.

I hope this helps.

Upvotes: 1

Related Questions