Akhil
Akhil

Reputation: 79

Unable to convert aws glue dynamicframe into spark dataframe

I'm trying to convert glue dynamic frame into the spark dataframevusing Dynamicframe.toDF, but I'm getting this exception

Traceback (most recent call last): File "/tmp/ManualJOB", line 62, in df1 = datasource0.toDF() File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 147, in toDF return DataFrame(self._jdf.toDF(self.glue_ctx._jvm.PythonUtils.toSeq(scala_options)), self.glue_ctx) File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call answer, self.gateway_client, self.target_id, self.name) File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o176.toDF. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 15.0 failed 4 times, most recent failure: Lost task 1.3 in stage 15.0 (TID 198, 172.31.0.175, executor 6): com.amazonaws.services.glue.util.FatalException: Unable to parse file: Manual Bound.csv

Can anyone help me with what I am missing?

Thanks in advance!

Upvotes: 2

Views: 2528

Answers (1)

Prabhakar Reddy
Prabhakar Reddy

Reputation: 5144

This issue happens when there are characters which are of non UTF-8 encoding.Glue only supports UTF-8 encoding as per this doc.

Text-based data, such as CSVs, must be encoded in UTF-8 for AWS Glue to process it successfully. For more information, see UTF-8 in Wikipedia.

You can verify if your file has invalid characters by running below command which will print them.This is for linux and you can use equivalent if you are using other operating system.

iconv -f UTF-8 your_file -o /dev/null; echo $?

to convert to UTF-8 you can pass the CSV to below command

iconv -f ISO-8859-1 -t UTF-8 file.csv > file-utf8.csv

Upvotes: 1

Related Questions