Ankur Shrivastava
Ankur Shrivastava

Reputation: 243

How to test AWS Glue code without dev endpoint

I would like to avoid AWS dev endpoint. Is there a way where I can test and debug my PySpark code without using AWS dev endpoint with the help of testing my code in local notebook/IDE?

Upvotes: 2

Views: 4054

Answers (5)

KM-Yash
KM-Yash

Reputation: 133

I was able to test without dev endpoints

Please follow the instructions here https://support.wharton.upenn.edu/help/glue-debugging

Upvotes: 0

Dariusz Bielak
Dariusz Bielak

Reputation: 415

As others have said, it depends on which part of the Glue are you going to use. If your code is based on pure Spark, without the Dynamic Frames etc. Then local version of Spark may suffice, if however you are intending on using Glue extensions, there is not really an option of not using the Dev End point at this stage.

I hope that this helps.

Upvotes: 1

Yuva
Yuva

Reputation: 3173

If you are going to deploy your pyspark code on AWS Glue service, you may have to use GlueContext & other AWS Glue APIs. So if you would like to test against AWS Glue service, using these AWS Glue APIs then you have to have an AWS Dev Endpoint.

However having a AWS Glue notebook is optional, since you can setup zeppelin, etc. establish an ssh tunnel connection with AWS Glue DEP for dev / testing from local env. Make sure you delete the DEPoint once your development/testing is done for the day.

Alternately, if you are not keen on using AWS Glue APIs other than GlueContext, then yes, you can setup zeppelin in local environment, test the code locally and then upload your code to S3, create a Glue job for testing in AWS Glue Service

Upvotes: 0

Sandeep Fatangare
Sandeep Fatangare

Reputation: 2144

We use pytest to test pyspark code. We keep pyspark code in another file and calls those functions inglue code file. With this separation, we can unit test pyspark code using pytest

Upvotes: 0

noobius
noobius

Reputation: 1539

We have a setup here, where we have pyspark install locally and we use VSCode to develop our pyspark codes, unit test, and debug. We run the codes against the local pyspark installation during development, then we deploy those codes to EMR to run with real dataset.

I'm not sure how much of this apply to what you're trying to do with Glue, as it's a level higher in abstraction.

Upvotes: 0

Related Questions