Reputation: 3
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 76.0 failed 4 times, most recent failure: Lost task 5.3 in stage 76.0 (TID 2334) (10.139.64.5 executor 6): com.databricks.sql.io.FileReadException: Error while reading file <File_Path> It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. If Delta cache is stale or the underlying files have been removed, you can invalidate Delta cache manually by restarting the cluster.
Upvotes: 0
Views: 7523
Reputation: 448
Additionally to what the answer by AbhishekKhandave-MT suggests what you can try is explicitly repairing the table:
FSCK REPAIR TABLE delta.`path/to/delta`
This also fixes scenarios where the underlying files of the table have actually been changed without it being reflected in the "_delta_log" transaction log.
Upvotes: 1
Reputation: 3240
There are two ways you can try for this error –
Invalidates the cached entries for Apache Spark cache, which include data and metadata of the given table or view. The invalidated cache is populated in lazy manner when the cached table or the query associated with it is executed again.
REFRESH [TABLE] table_name
Upvotes: 0