ETL architecture with AWS Glue and Data Pipeline

Question

I'm trying to decide whether to use AWS Glue or Amazon Data Pipeline for our ETL. I need to incrementally copy several tables to Redshift. Almost all tables need to be copied with no transformation. One table requires a transformation that could be done using Spark.

Based on my understanding from these two services, the best solution is to use a combination of the two. Data Pipeline can copy everything to S3. From there, if no transformation is needed, Data Pipeline can use Redshift COPY to move the data to Redshift. Where a transformation is required, a Glue job can apply the transformation and copy the data to Redshift.

Is this a sensible strategy or am I misunderstanding the applications of these services?

ETL architecture with AWS Glue and Data Pipeline

Answers (1)

Related Questions