Reputation: 79
I'm trying to convert glue dynamic frame into the spark dataframevusing Dynamicframe.toDF, but I'm getting this exception
Traceback (most recent call last): File "/tmp/ManualJOB", line 62, in df1 = datasource0.toDF() File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 147, in toDF return DataFrame(self._jdf.toDF(self.glue_ctx._jvm.PythonUtils.toSeq(scala_options)), self.glue_ctx) File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call answer, self.gateway_client, self.target_id, self.name) File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o176.toDF. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 15.0 failed 4 times, most recent failure: Lost task 1.3 in stage 15.0 (TID 198, 172.31.0.175, executor 6): com.amazonaws.services.glue.util.FatalException: Unable to parse file: Manual Bound.csv
Can anyone help me with what I am missing?
Thanks in advance!
Upvotes: 2
Views: 2528
Reputation: 5144
This issue happens when there are characters which are of non UTF-8 encoding.Glue only supports UTF-8 encoding as per this doc.
Text-based data, such as CSVs, must be encoded in UTF-8 for AWS Glue to process it successfully. For more information, see UTF-8 in Wikipedia.
You can verify if your file has invalid characters by running below command which will print them.This is for linux and you can use equivalent if you are using other operating system.
iconv -f UTF-8 your_file -o /dev/null; echo $?
to convert to UTF-8 you can pass the CSV to below command
iconv -f ISO-8859-1 -t UTF-8 file.csv > file-utf8.csv
Upvotes: 1