Reputation: 881
I'm trying to run unit tests on my pyspark scripts locally so that I can integrate this into our CI.
$ pyspark
...
>>> import pandas as pd
>>> df = pd.DataFrame([(1,2,3), (4,5,6)])
>>> df
0 1 2
0 1 2 3
1 4 5 6
As per the documentation, I should be able to convert using the following:
from awsglue.dynamicframe import DynamicFrame
dynamic_frame = DynamicFrame.fromDF(dataframe, glue_ctx, name)
But when I try to convert to a DynamicFrame I get errors when trying to instantiate the gluecontext
$ pyspark
>>> from awsglue.context import GlueContext
>>> sc
<SparkContext master=local[*] appName=PySparkShell>
>>> glueContext = GlueContext(sc)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/awsglue/context.py", line 43, in __init__
self._glue_scala_context = self._get_glue_scala_context(**options)
File "/Library/Python/2.7/site-packages/awsglue/context.py", line 63, in _get_glue_scala_context
return self._jvm.GlueContext(self._jsc.sc())
TypeError: 'JavaPackage' object is not callable
How do I get this working WITHOUT using AWS Glue Dev Endpoints? I don't want to be charged EVERY TIME I commit my code. that's absurd.
Upvotes: 1
Views: 4666
Reputation: 2144
Why do you want to convert from dataframe to DynamicFrame as you can't do unit testing using Glue APIs - No mocks for Glue APIs?
I prefer following approach:
Upvotes: 1
Reputation: 85
I think present there is no other alternate option for us other than using glue. For reference:Can I test AWS Glue code locally?
Upvotes: 0