alfonso
alfonso

Reputation: 884

In pandas and matplotlib, why is plt not called directly on the dataframe?

It's not clear to me why plotting is done like this:

import pandas as pd
import matplotlib.pyplot as plt

df.boxplot(column='initial_cost', by='Borough', rot=90)

plt.show()

How is the dataframe tied to plt.show()? I've done a few web searches and even took a look at the documentation(!) but couldn't find anything addressing this specifically.

I would expect something more like:

boxplot = df.boxplot(column='initial_cost', by='Borough', rot=90)
plt.show(boxplot)

Or even something like this:

boxplot = df.boxplot(column='initial_cost', by='Borough', rot=90)
boxplot.plt.show()

Upvotes: 2

Views: 168

Answers (1)

filippo
filippo

Reputation: 5294

Matplotlib provides a MATLAB-like state-machine, the pyplot module, that takes care under the hood of instantiating and managing all the objects you need to draw a plot.

Pandas hooks into that in the same fashion. When you call it takes care of loading pyplot and creating a matplotlib Figure, Axes, several Line2D objects and everything that makes a boxplot.

When you call plt.show() it will track all the figures you created with the state-machine API, create a GUI with those figures and take care of displaying it.

If you need more control, you can of course do it all yourself with the object-oriented API. Create a figure, axes, manually draw the canvas, it's all there if needed.

As far as I've seen the common practice is a mix of both: hook into the object-oriented API when needed but still let pyplot take care of displaying or saving everything to a file.

Upvotes: 2

Related Questions