Reputation: 569
I ran a Jupyter PySpark notebook on an EMR 7.3.0 cluster and encountered the error below after a simple df.count()
call. This was not an issue with my code; the same dataframe (df
) had already been cached (df.cache()
) once and printed from (df.orderBy('someRow').show(20, False)
, etc) several times in previous cells with no issue.
Any idea what's going wrong with EMR/PySpark and how to resolve the issue?
Exception in thread cell_monitor-13:
Traceback (most recent call last):
File "/mnt/notebook-env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/mnt/notebook-env/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/mnt/notebook-env/lib/python3.9/site-packages/awseditorssparkmonitoringwidget/cellmonitor.py", line 178, in cell_monitor
job_binned_stages[job_id][stage_id] = all_stages[stage_id]
KeyError: 11718
Upvotes: 0
Views: 21