Matt
Matt

Reputation: 5684

How do I make matplotlib work in AWS EMR Jupyter notebook?

This is very close to this question, but I have added a few details specific to my question:

Matplotlib Plotting using AWS-EMR jupyter notebook

I would like to find a way to use matplotlib inside my Jupyter notebook. Here is the code-snippet in error, it's fairly simple:

notebook

import matplotlib
matplotlib.use("agg")
import matplotlib.pyplot as plt
plt.plot([1,2,3,4])
plt.show()

I chose this snippet because this line alone fails as it tries to use TKinter (which is not installed on an AWS EMR cluster):

import matplotlib.pyplot as plt

When I run the full notebook snippet, the result is no runtime error but also nothing happens (no graph is shown.) My understanding on one way this can work is by adding either of the following snips:

pyspark magic notation

%matplotlib inline

results

unknown magic command 'matplotlib'
UnknownMagic: unknown magic command 'matplotlib'

IPython explicit magic call

from IPython import get_ipython
get_ipython().run_line_magic('matplotlib', 'inline')

results

'NoneType' object has no attribute 'run_line_magic'
Traceback (most recent call last):
AttributeError: 'NoneType' object has no attribute 'run_line_magic'

to my notebook which invokes a spark magic command which inlines matplotlib plots (at least that's my interpretation.) I have tried both of these after using a bootstrap action:

EMR bootstrap

sudo pip install matplotlib
sudo pip install ipython

Even with these added, I still get an error that there is no magic for matplotlib. So my question is definitely:

Question

How do I make matplotlib work in an AWS EMR Jupyter notebook?

(Or how do I view graphs and plot images in AWS EMR Jupyter notebook?)

Upvotes: 30

Views: 17151

Answers (7)

ststst
ststst

Reputation: 41

%matplot plt

after plt.show() function works for me.

Upvotes: 3

Chris Tokita
Chris Tokita

Reputation: 11

To plot something in AWS EMR notebooks, you simply need to use %matplot plt. You can see this documented about midway down this page from AWS.

For example, if I wanted to make a quick plot:

import matplotlib.pyplot as plt

plt.clf() #clears previous plot in EMR memory
plt.plot([1,2,3,4])
plt.show()

%matplot plt

Upvotes: 1

Madaditya
Madaditya

Reputation: 173

The answer by @00schneider actually works.

import matplotlib.pyplot as plt

# plot data here
plt.show()

after

plt.show()

re-run the magic cell that contains the below, and you will see a plot on your AWS EMR Jupyter PySpark notebook

%matplot plt

Upvotes: 14

00schneider
00schneider

Reputation: 788

Import matplotlib as

import matplotlib.pyplot as plt

and use the magic command %matplot plt instead as shown in the tutorial here: https://aws.amazon.com/de/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/

Upvotes: 4

vinay
vinay

Reputation: 1

Try below code. FYI we have matplotlib 3.1.1 installed in Python3.6 on emr-5.26.0 and i used PySpark Kernel. Make sure that "%matplotlib inline" is first line in cell

%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt
plt.plot([1,2,3,4])
plt.show()

Upvotes: -1

Foxan Ng
Foxan Ng

Reputation: 7151

As you mentioned, matplotlib is not installed on the EMR cluster, therefore such error will occur:

error

However, it is actually available in the managed Jupyter notebook instance (the docker container). Using the %%local magic will allow you to run the cell locally:

local

Upvotes: 8

The following should work:

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([1,2,3,4])

Run the entire script in one cell

Upvotes: 2

Related Questions