rahul_raj
rahul_raj

Reputation: 275

Run SQL script file with multiple complex queries using AMAZON datapipeline

I have just created an account on Amazon AWS and I am going to use DATAPIPELINE to schedule my queries. Is it possible to run multiple complex SQL queries from .sql file using SQLACTIVITY of data pipeline?

My overall objective is to process the raw data from REDSHIFT/s3 using sql queries from data pipeline and save it to s3. Is it the feasible way to go?

Any help in this regard will be appreciated.

Upvotes: 0

Views: 1795

Answers (3)

k1r0
k1r0

Reputation: 552

I managed to execute a script with multiple insert statements with following AWS datapipeline configuration:

    {
      "id": "ExecuteSqlScript",
      "name": "ExecuteSqlScript",
      "type": "SqlActivity",
      "scriptUri": "s3://mybucket/inserts.sql",
      "database": { "ref": "rds_mysql" },
      "runsOn": { "ref": "Ec2Instance" }
    }, {
      "id": "rds_mysql",
      "name": "rds_mysql",
      "type": "JdbcDatabase",
      "username": "#{myUsername}",
      "*password": "#{*myPassword}",
      "connectionString" : "#{myConnStr}",
      "jdbcDriverClass": "com.mysql.jdbc.Driver",
      "jdbcProperties": ["allowMultiQueries=true","zeroDateTimeBehavior=convertToNull"]
    },

It is important to allow the MySql driver to execute multiple queries with allowMultiQueries=true and the script s3 path is provided by scriptUri

Upvotes: 0

Kishan Pandey
Kishan Pandey

Reputation: 189

The input of your sql queries will be a single DATA Node and Output will be a single data file. Data pipeline provide only one "Select query" field in which you will write your extraction/transformation query. I don't think there are any use case of multiple queries file.

However if you want to make your pipeline configurable ,you can make your pipeline configurable by adding "parameters" and values objects in your pipeline definition JSON.

{
   "objects":[
  {
     "selectQuery":"#{myRdsSelectQuery}"
  }
  ],
  "parameters":[
  { 
     "description":"myRdsSelectQuery",
     "id":"myRdsSelectQuery",
     "type":"String"
  }
  ],
   "values":{
      "myRdsSelectQuery":"Select Query"
 }
}

If you want to execute and schedule multiple sql script , you can do with ShellCommandActivity.

Upvotes: 0

simplycoding
simplycoding

Reputation: 2977

Yes, if you plan on moving the data from Redshift to S3, you need to do an UNLOAD command found here: http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

Upvotes: 0

Related Questions