Dan Ciborowski - MSFT
Dan Ciborowski - MSFT

Reputation: 7237

Insert Zipped File into RedShift

I have a file up in s3 that is zipped. I would like to insert it into a RedShift database. The only way my research has found to do this is by launching an ec2 instance. Moving the file there, unzipping it, and sending it back to S3. Then to insert it into my RedShift table. But I am trying to do this all from JavaSDK from an outside machine and do not want to have to use an Ec2 instance. Is there a way to just have an EMR job unzip the file? Or insert the zipped file directly into RedShift?

Files are .zip not .gzip

Upvotes: 6

Views: 8828

Answers (3)

Sandesh Deshmane
Sandesh Deshmane

Reputation: 2305

if your file is gzip then try below command

copy mutable from 's3://abc/def/yourfilename.gz' CREDENTIALS 'aws_access_key_id=xxxxx;aws_secret_access_key=yyyyyy' delimiter ',' gzip

Upvotes: -3

coderz
coderz

Reputation: 4999

add gzip option, please refer: http://docs.aws.amazon.com/redshift/latest/dg/c_loading-encrypted-files.html we can use Java client to execute SQL

Upvotes: 3

Joe Harris
Joe Harris

Reputation: 14035

You cannot directly insert a zipped file into Redshift as per Guy's comment.

Assuming this is not a 1 time task, I would suggest using AWS Data Pipeline to perform this work. See this example of copy data between S3 buckets. Modify the example to unzip and then gzip your data instead of simply copying it.

Use the ShellCommandActivity to execute a shell script that performs the work. I would assume this script could invoke Java if you choose and appropriate AMI as your EC2 resource (YMMV).

Data Pipeline is highly efficient for this type of work because it will start and terminate the EC2 resource automatically plus you do not have to worry about discovering the name of the new instance in your scripts.

Upvotes: 10

Related Questions