perelin
perelin

Reputation: 1448

How can I preview AWS Glue jobs?

I´m want to use Glue to extract data from an RDS PostgresDB, transform/clean it and load into an S3 Bucket so I can use Athena and Quicksight to visualize the data and create reports.

I´m currently authoring the Glue job for the data cleanup (remove NULL values and such things). But I can see no easy way to preview the job script results. I can only see the results in the S3 bucket after running the complete job. And running the job takes at least 10 minutes to start, and a few more to finish. So I have a roundtrip time of about 15 minutes to see if my code is correct. Is this supposed to be the workflow here? Am I missing anything?

I´m new to the whole BI/data stuff. Maybe I´m following the wrong approach. I want to visualize data from RDS in Quicksight and need to do some data cleanup first. Any other approaches that make sense for this scenario? (we are talking about a small dataset of about a few 100MBs)

Thanks!

Upvotes: 0

Views: 1114

Answers (1)

gapvision
gapvision

Reputation: 1029

Look into notebooks. You can set them up in the AWS Glue Console. They give you an interactive way of writing your code before you put the script into a Glue Script. No big difference between Sagemaker (Juypter) and Zeppelin notebooks for standard cases, guess its down to our taste.

In general, especially with small datasets, a local development environment might work out for you as well and gives you even more freedom. For larger datasets a common practise is to get a sample of only a few hundred records so it can be processed instantaneous. Helps a lot during development.

And last: Not sure why to go away from Postgres. What kind of analysis do you want to do you can't do in the Relational world? Also, why don't do the clean-up in the DB?

Upvotes: 1

Related Questions