Reputation: 577
I have a dataframe that I cannot .show(). Every time it gives the following error? Is it possible that there is a corrupted column?
Error:
Py4JJavaError: An error occurred while calling o426.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 381.0 failed 4 times, most recent failure: Lost task 0.3 in stage 381.0 (TID 19204, ddlps28.rsc.dwo.com, executor 99): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/python/pyspark/worker.py", line 177, in main
Upvotes: 2
Views: 5841
Reputation: 879
Your error most likely isn't actually in the "show" operation. It's that .show is what triggers execution of your DAG. You said it works if you don't run your UDF, you probably just have a different error in that UDF. The log would probably be on the worker nodes, so try access through your Hadoop UI to get access to executor logs to see what really is breaking
Upvotes: 3