How to map multiple sources to a single target in a AWS Glue Job

Question

I have a MySQL database and a Mongo database that combined contain 200 tables and I am trying to connect that to Glue and merge some tables together to end up with 20 tables of merged data along with some filters and scripts that filter out some of this data before it ends up where it needs to.

I am using AWS Glue to do that and after generating 1-1 tables using the crawlers I want to start merging these tables together but when I am creating a job I can only select a single table as a source, which means I'd end up with 200 jobs.

Is there a way I can have a job pointing to multiple sources and map those to a single table like in the screenshot below?

Should I be using a different tool instead or doing that step somewhere else (i.e. using DMS and generating another destination for the crawlers?)

Kishore Bharathy · Accepted Answer

You should be doing this using a code level approach by mapping each table to separate dataframes/dynamic frames and joining these frames together and printing/mapping it out along with target schema using the applymapping function. here is a clear example to join or merge two tables in glue using pyspark: Join two data frames, select all columns from one and some columns from the other

How to map multiple sources to a single target in a AWS Glue Job

Answers (1)

Related Questions