Tanmay Bhatnagar
Tanmay Bhatnagar

Reputation: 2470

Convert/compress SQL database to parquet format for RedShift

I have 3 SQL databases in a s3 bucket on AWS that I would like to upload to redshift. I learned that converting them to some big data format like parquet would be much better since making queries on redshift costs money and performance in those formats is just better overall. How do I convert my databases to those formats ? Please ask for any further info that might be required. Thanks

Upvotes: 0

Views: 165

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269276

Amazon Redshift has its own internal format. Once the data is loaded into Redshift, it is "in the database" so you don't need to worry about formats.

If you are only doing occasional queries, you could instead use Amazon Athena. Athena allows you to write SQL statements against data stored in Amazon S3 without having to "load" the data. Basically, Athena goes to the data rather than the data going to the database.

When using Athena, you are charged for the amount of data read from disk. Therefore, queries can run at a lower cost of the data is compressed. Queries can also run faster and lower cost if the data is stored in a columnar format (eg Parquet, ORC) because Athena can jump straight to the relevant data rather than having to read it all in from disk.

In contrast, Amazon Redshift is charged based upon the size of the cluster you run. It has no additional cost for running the actual queries.

For more information about optimizing use of Amazon Athena, see: Analyzing Data in S3 using Amazon Athena | AWS Big Data Blog

Upvotes: 1

Related Questions