umdev
umdev

Reputation: 359

Unable to parse file from AWS Glue dynamic_frame to Pyspark Data frame

Iam new to AWs glue.

I am facing issue in converting glue data frame to pyspark data frame :

Below is the crawler configuration i created for reading csv file glue_cityMapDB="csvDb" glue_cityMapTbl="csv table"

datasource2 = glue_context.create_dynamic_frame.from_catalog(database = glue_cityMapDB, table_name = glue_cityMapTbl, transformation_ctx = "datasource2")

datasource2.show()

print("Show the data source2 city DF")
cityDF=datasource2.toDF()
cityDF.show()

Output:

Here i am getting output from the glue dydf - #datasource2.show() But after converting to the pyspark DF, iam getting following error

S3NativeFileSystem (S3NativeFileSystem.java:open(1208)) - Opening 's3://s3source/read/names.csv' for reading 2020-04-24 05:08:39,789 ERROR [Executor task launch worker for task

Appreciate if anybody can help on this?

Upvotes: 3

Views: 8287

Answers (1)

Eman
Eman

Reputation: 851

Make use of a file are of UTF-8 encoded. You can check using file or convert using inconv or any other text editor like sublime.

You can also read the files as a dataframe using:

df = spark.read.csv('s3://s3source/read/names.csv')

then convert to dynamic frames using fromDF()

Upvotes: 2

Related Questions