Save sqoop incremental import id

Question

I have a lot of sqoop jobs running in AWS EMR, but sometimes i need to turn off this instance.

There's a way to save the last id from incremental import, maybe localy and upload it to s3 via cronjob.

My first idea is, when i create the job i just send a request to Redshift, where my data is stored and get the last id or last_modified, via bash script.

Another idea is to get the output of sqoop job --show $jobid, filter the parameter of last_id and using it to create the job again.

But i don't know if sqoop offer a way to do this more easily.

Carleto · Accepted Answer

Solution

I change the file sqoop-site.xml and add the endpoint to my MySQL.

Steps

Create the MySQL instance and run this queries: CREATE TABLE SQOOP_ROOT (version INT, propname VARCHAR(128) NOT NULL, propval VARCHAR(256), CONSTRAINT SQOOP_ROOT_unq UNIQUE (version, propname)); and INSERT INTO SQOOP_ROOT VALUES(NULL, 'sqoop.hsqldb.job.storage.version', '0');
Change the original sqoop-site.xml adding your MySQL endpoint, user and password.

  
    sqoop.metastore.client.enable.autoconnect
    true
    If true, Sqoop will connect to a local metastore
      for job management when no other metastore arguments are
      provided.
    
  
  

  
  
  
    sqoop.metastore.client.autoconnect.url
    jdbc:mysql://your-mysql-instance-endpoint:3306/database
    The connect string to use when connecting to a
      job-management metastore. If unspecified, uses ~/.sqoop/.
      You can specify a different path here.
    
  
  
    sqoop.metastore.client.autoconnect.username
    ${sqoop-user}
    The username to bind to the metastore.
    
  
  
    sqoop.metastore.client.autoconnect.password
    ${sqoop-pass}
    The password to bind to the metastore.

When you execute the command sqoop job --list in first time it will return zero values. But after creating the jobs, if you shutdown the EMR, you don't lose the sqoop metadata from executing jobs.

In EMR, we can use the Bootstrap Action to automate this operation in cluster creation.

Save sqoop incremental import id

Answers (2)

Solution

Steps

Related Questions