Reputation: 1
I noticed something weird when using EMR Notebooks attached to a cluster with EMR 6.1.0 with Hadoop, Spark and Livy.
You see, the packages I install on my master node are not available from the default Python3 kernel, but they are available on the default PySpark kernel.
When I get the hostname on the PySpark kernel I can see it matches the private dns name of my master node. Nevertheless, when I run a Python3 or Terminal kernels, I get a different hostname, one that does not match with any of my nodes on my cluster.
Where is the Python3 code running from when I use the Python3 kernel? What EC2 machine terminal am I using when I select a terminal kernel? I've checked and there's no docker container running on my master machine either
Is it possible to use my master's Python3 as Kernel instead of those?
Upvotes: 0
Views: 476
Reputation: 348
You can try EMR version 5.32+ or 6.2+ to get consistent experience between Python and Pyspark kernels. The difference is because from these versions EMR started using Jupyter Enterprise Gateway to run kernels on the clusters directly. Before these versions kernels were not run on the clusters but rather the notebook instance and for pyspark kernel it was using Livy for remote submission of Spark job on the cluster.
Upvotes: 1