Reputation: 367
I would like to export data from an Amazon Redshift table into an external table stored in Amazon S3. Every hour, I want to export rows from the Redshift source into the external table target.
What kind of options exist in AWS to achieve this?
I know that there is the UNLOAD
command that allows me to export data to S3, but I think it would not work to store the data into an external table (which is partitioned too). Or is Amazon EMR probably the only method to get this working?
Upvotes: 0
Views: 2303
Reputation: 116
Its now possible to Insert into external tsble , since June 2020 i think: https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-redshift-now-supports-writing-to-external-tables-in-amazon-s3/
And heres documentation: https://docs.aws.amazon.com/redshift/latest/dg/r_INSERT_external_table.html
Basically theres 2 ways:
Typically you specify in redshift external schema of yours (ex my_stg) the glu database name, so any external table you create inside redshift external schema already knows glue catalog database name. Thats good news since op question is from 2018 👍
Upvotes: 1
Reputation: 269340
Amazon Redshift Spectrum external tables are read-only. You cannot update them from Redshift (eg via INSERT commands).
Therefore, you would need a method to create the files directly in S3.
UNLOAD
can certainly do this, but it cannot save the data in a partition structure.
Amazon EMR would, indeed, be a good option. These days it is charged per-second, so it would only need to run long enough to export the data. You could use your preferred tool (eg Hive or Spark) to export the data from Redshift, then write it into a partitioned external table.
For example, see: Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning | AWS Big Data Blog
Another option might be AWS Glue. I'm not too familiar with it, but it can output into partitions, so this might be an even easier method to accomplish your goal!
See: Managing Partitions for ETL Output in AWS Glue - AWS Glue
Upvotes: 1