Reputation: 107
Trying to read Input.csv file from s3 bucket, get distinct values ( and do some other transformations) and then writing to target.csv file but running into issues when trying to write data to Target.csv in s3 bucket.
Below is the code:
import sys
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
glueContext = GlueContext(SparkContext.getOrCreate())
dfnew = glueContext.create_dynamic_frame_from_options("s3", {'paths': ["s3://bucket_name/Input.csv"] }, format="csv" )
dfMod = dfnew.select_fields(["Col2","Col3"]).toDF().distinct()
dnFrame = DynamicFrame.fromDF(dfMod, glueContext, "test_nest")
datasink = glueContext.write_dynamic_frame.from_options(frame = dnFrame, connection_type = "s3",connection_options = {"path": "s3://bucket_name/Target.csv"}, format = "csv", transformation_ctx ="datasink")
This is the data in Input.csv:
Col1 Col2 Col3
1 1 -30.4
2 2 -30.5
3 3 6.70
4 4 5.89
5 4 6.89
6 4 6.70
7 4 5.89
8 4 5.89
Error:
val dfmod = dfnew.select_fields(["Col2","Col3"]).toDF().distinct().show() ^ SyntaxError: invalid syntax During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/amazon/bin/runscript.py", line 92, in <module>
while "runpy.py" in new_stack.tb_frame.f_code.co_filename: AttributeError: 'NoneType' object has no attribute 'tb_frame'
Which I do understand comes because I am using create_dynamic_frame_from_options and not from_catalog but how do I get the desired functionality while using from_options ( as my format is csv in s3) ?.
IAM (Glue service policy):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::bucket_Name/Output/**/**/*"
]
}
]
}
S3 Bucket Policy:
{
"Version": "2012-10-17",
"Id": "Policy***",
"Statement": [
{
"Sid": "Stmt1***",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::account_number:root"
},
"Action": "s3:*",
"Resource": "arn:aws:s3:::bucket_name"
}
]
}
Kindly help
Upvotes: 3
Views: 1182
Reputation: 10333
syntax error on line
val dfMod = dfnew.select_fields(["Col2","Col3"]).toDF().distinct().show()
can be corrected as follows, we don't need val
or show()
it will simply return a dataframe we convert it DynamicFrame before passing to write_dynamic_frame
also need an import statement at top from awsglue.dynamicframe import DynamicFrame
dfMod = dfnew.select_fields("Col2","Col3").toDF().distinct()
dnFrame = DynamicFrame.fromDF(dfMod, glueContext, "test_nest")
Upvotes: 1