Amazon Redshift table to external table in S3 every hour

Question

I would like to export data from an Amazon Redshift table into an external table stored in Amazon S3. Every hour, I want to export rows from the Redshift source into the external table target.

What kind of options exist in AWS to achieve this?

I know that there is the UNLOAD command that allows me to export data to S3, but I think it would not work to store the data into an external table (which is partitioned too). Or is Amazon EMR probably the only method to get this working?

John Rotenstein · Accepted Answer

Amazon Redshift Spectrum external tables are read-only. You cannot update them from Redshift (eg via INSERT commands).

Therefore, you would need a method to create the files directly in S3.

UNLOAD can certainly do this, but it cannot save the data in a partition structure.

Amazon EMR would, indeed, be a good option. These days it is charged per-second, so it would only need to run long enough to export the data. You could use your preferred tool (eg Hive or Spark) to export the data from Redshift, then write it into a partitioned external table.

For example, see: Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning | AWS Big Data Blog

spark-redshift

Another option might be AWS Glue. I'm not too familiar with it, but it can output into partitions, so this might be an even easier method to accomplish your goal!

See: Managing Partitions for ETL Output in AWS Glue - AWS Glue

Amazon Redshift table to external table in S3 every hour

Answers (2)

Related Questions