c3p0
c3p0

Reputation: 125

Sagemaker PySpark: Kernel Dead

I followed the instructions here to set up an EMR cluster and a SageMaker notebook. I did not have any errors until the last step.

When I open a new Notebook in Sagemaker, I get the message:

The kernel appears to have died. It will restart automatically.

And then:

        The kernel has died, and the automatic restart has failed.
        It is possible the kernel cannot be restarted. 
        If you are not able to restart the kernel, you will still be able to save the 
notebook, but running code will no longer work until the notebook is reopened.

This only happens when I use the pyspark/Sparkmagic kernel. Notebooks opened with the Conda kernel or any other kernel work fine.

My EMR cluster is set up exactly as in the instructions, with an added rule:

[
  {
    "Classification": "spark",
    "Properties": {
      "maximizeResourceAllocation": "true"
    }
  }
]

I'd appreciate any pointers on why this is happening and how I can debug/fix.

P.S.: I've done this successfully in the past without any issues. When I tried re-doing this today, I ran into this issue. I tried re-creating the EMR clusters and Sagemaker notebooks, but that didn't help.

Upvotes: 2

Views: 2270

Answers (1)

Neelam Gehlot
Neelam Gehlot

Reputation: 422

Thank you for using Amazon SageMaker.

The issue here is Pandas 0.23.0 changed the location of a core class named DataError and SparkMagic has not been updated to require DataError from correct namespace.

The workaround for this issue is to downgrade Pandas version in SageMaker Notebook Instance with pip install pandas==0.22.0.

You can get more information in this open github issue https://github.com/jupyter-incubator/sparkmagic/issues/458.

Let us know if there is any other way we can be of assistance.

Thanks,
Neelam

Upvotes: 5

Related Questions