Reputation: 159
I am running a pyspark job using azure synpase workspace. My Spark Job is failing with following error. Can someone help me in debugging this error?
This error is coming in spark application run by Pipeline on Azure Synapse
Stacktrace: An error occurred while calling o1394.execute.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 94.0 failed 4 times, most recent failure: Lost task 0.3 in stage 94.0 (TID 2313) (vm-1d164027 executor 3): java.io.EOFException
at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:85)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:520)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476)
at
Upvotes: 0
Views: 956
Reputation: 2304
The error message indicates that the Spark job is failing because it is encountering an EOFException while reading Parquet files. This suggests that there is something wrong with the Parquet files themselves, and that they are either incomplete or corrupt.
To debug this issue, you will need to inspect the Parquet files themselves to see if there is anything wrong with them. One way to do this is to use the "parquet-tools" command-line tool. This can be used to examine the contents of Parquet files, and can be helpful in identifying issues such as missing or corrupted data.
If you are unable to identify the cause of the issue using the parquet-tools tool, it's possible it could be a library implementation issue.
Upvotes: 1