JonTroncoso
JonTroncoso

Reputation: 881

How do I convert from dataframe to DynamicFrame locally and WITHOUT using glue dev endoints?

I'm trying to run unit tests on my pyspark scripts locally so that I can integrate this into our CI.

$ pyspark
...
>>> import pandas as pd
>>> df = pd.DataFrame([(1,2,3), (4,5,6)])
>>> df
   0  1  2
0  1  2  3
1  4  5  6

As per the documentation, I should be able to convert using the following:

from awsglue.dynamicframe import DynamicFrame
dynamic_frame = DynamicFrame.fromDF(dataframe, glue_ctx, name)

But when I try to convert to a DynamicFrame I get errors when trying to instantiate the gluecontext

$ pyspark
>>> from awsglue.context import GlueContext
>>> sc
<SparkContext master=local[*] appName=PySparkShell>
>>> glueContext = GlueContext(sc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/awsglue/context.py", line 43, in __init__
    self._glue_scala_context = self._get_glue_scala_context(**options)
  File "/Library/Python/2.7/site-packages/awsglue/context.py", line 63, in _get_glue_scala_context
    return self._jvm.GlueContext(self._jsc.sc())
TypeError: 'JavaPackage' object is not callable

How do I get this working WITHOUT using AWS Glue Dev Endpoints? I don't want to be charged EVERY TIME I commit my code. that's absurd.

Upvotes: 1

Views: 4666

Answers (2)

Sandeep Fatangare
Sandeep Fatangare

Reputation: 2144

Why do you want to convert from dataframe to DynamicFrame as you can't do unit testing using Glue APIs - No mocks for Glue APIs?

I prefer following approach:

  1. Write two files per glue job - job_glue.py and job_pyspark.py
  2. Write Glue API specific code in job_glue.py
  3. Write non-glue api specific code job_pyspark.py
  4. Write pytest test-cases to test job_pyspark.py

Upvotes: 1

TEJASWAKUMAR
TEJASWAKUMAR

Reputation: 85

I think present there is no other alternate option for us other than using glue. For reference:Can I test AWS Glue code locally?

Upvotes: 0

Related Questions