Naguib Ihab
Naguib Ihab

Reputation: 4506

How to map multiple sources to a single target in a AWS Glue Job

I have a MySQL database and a Mongo database that combined contain 200 tables and I am trying to connect that to Glue and merge some tables together to end up with 20 tables of merged data along with some filters and scripts that filter out some of this data before it ends up where it needs to.

I am using AWS Glue to do that and after generating 1-1 tables using the crawlers I want to start merging these tables together but when I am creating a job I can only select a single table as a source, which means I'd end up with 200 jobs.

Is there a way I can have a job pointing to multiple sources and map those to a single table like in the screenshot below?

enter image description here

Should I be using a different tool instead or doing that step somewhere else (i.e. using DMS and generating another destination for the crawlers?)

Upvotes: 3

Views: 7818

Answers (1)

Kishore Bharathy
Kishore Bharathy

Reputation: 451

You should be doing this using a code level approach by mapping each table to separate dataframes/dynamic frames and joining these frames together and printing/mapping it out along with target schema using the applymapping function. here is a clear example to join or merge two tables in glue using pyspark: Join two data frames, select all columns from one and some columns from the other

Upvotes: 1

Related Questions