What's an organizing structure and development workflow for AWS Glue jobs?

Question

I've been working with AWS Glue for the past 3-4 months to create PySpark scripts for ETL of large datasets. I'll typically create a notebook to do some exploratory work, then create a full-fledged version of the script, which I trigger manually via the console. I'm at the point where I have working bits and pieces which I now need to string together into a more robust and managed production pipeline.

I will be managing two unrelated datasets, each of which entails successive cleansing and transformation operations performed by different Glue jobs, with intermediate and final data stored in S3.

When I look at my Jobs page in AWS Glue Studio (AWS Glue -> ETL Jobs) everything is all jumbled together: notebooks as well as the multiple jobs for each of my data pipelines

There's plenty of great content available on how to create, run and optimize individual AWS Glue jobs, but I haven't been able to find anything comprehensive that describes best-practice of how to organize and manage everything. I anticipate at some point I will add some sort of orchestration layer on top of the jobs, but it feels like that still leaves the question of how to organize and manage the underlying jobs themselves.

Questions

Is there a way to organize jobs into three (and in future possibly more) "buckets": production-version jobs for dataset A pipeline; production-version jobs for dataset B pipeline; exploratory notebooks.
What are recommendations for a development workflow for AWS Glue? I'm used to a more traditional application development workflow with distinct environments (Dev, Prod, etc), and a code repository and CI/CD pipeline to promote code through environments. I am wondering what's the Glue equivalent.

Either direct answers or pointers to recommended reading (that does more than just scratch the surface) would be much appreciated.

What's an organizing structure and development workflow for AWS Glue jobs?

Answers (1)

Related Questions

What&#39;s an organizing structure and development workflow for AWS Glue jobs?

Answers (1)

Related Questions

What's an organizing structure and development workflow for AWS Glue jobs?