Ashok choudhary
Ashok choudhary

Reputation: 3

SparkException: Job aborted

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 76.0 failed 4 times, most recent failure: Lost task 5.3 in stage 76.0 (TID 2334) (10.139.64.5 executor 6): com.databricks.sql.io.FileReadException: Error while reading file <File_Path> It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. If Delta cache is stale or the underlying files have been removed, you can invalidate Delta cache manually by restarting the cluster.

Upvotes: 0

Views: 7523

Answers (2)

restlessmodem
restlessmodem

Reputation: 448

Additionally to what the answer by AbhishekKhandave-MT suggests what you can try is explicitly repairing the table:

FSCK REPAIR TABLE delta.`path/to/delta`

This also fixes scenarios where the underlying files of the table have actually been changed without it being reflected in the "_delta_log" transaction log.

Upvotes: 1

Abhishek Khandave
Abhishek Khandave

Reputation: 3240

There are two ways you can try for this error –

  1. Refresh table

Invalidates the cached entries for Apache Spark cache, which include data and metadata of the given table or view. The invalidated cache is populated in lazy manner when the cached table or the query associated with it is executed again.

REFRESH [TABLE] table_name
  1. Manually restart the cluster.

Upvotes: 0

Related Questions