Reputation: 1143
I can not wrap my head around axes parameter, what it contains and how to use it for making subplots.
Would really appreciate if someone could explain what is going on in the following example
fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(15, 10))
for idx, feature in enumerate(df.columns[:-1]):
df.plot(feature, "cnt", subplots=True, kind="scatter", ax=axes[idx / 4, idx % 4])
Here is the data (UCI Bike sharing dataset):
Here is the output of the code snippet (a pairwise comparison of features and the end results):
To be more specific, here are the parts that I do understand (at least I think I do)
Here is what I do not understand
Upvotes: 4
Views: 2298
Reputation: 339580
Concerning the last question about the array indexing as [idx / 4, idx % 4]
:
The idea is to loop over all subplots and all dataframe columns at the same time. The problem is that the axes array is two-dimensional while the column array is one-dimensional. One therefore needs to decide over which of those to loop and map the loop index/indizes to the other dimension.
An intuitive way would be to use two loops
for i in range(axes.shape[0]):
for j in range(axes.shape[1]):
df.plot(df.columns[i*axes.shape[0]+j], "cnt", ... , ax=axes[i,j])
Here, i*axes.shape[0]+j
maps the two dimension of the numpy array to the single dimension of the columns list.
In the example from the question, the loop is over the columns, which means we have to somehow map the one-dimensional index to two dimensions. This is what [idx / 4, idx % 4]
does.. or should do. It will only work in python 2. To make it more comprehensible and version save, one should actually use [idx // 4, idx % 4]
. The //
makes it clear that an integer division is used. So for the first 4 idx values (0,1,2,3), idx // 4
is 0, for the next set of 4 values it's 1 and so on. idx % 4
calculates the index modulo 4. So (0,1,2,3) are mapped to (0,1,2,3), and then (4,5,6,7) are mapped to (0,1,2,3) again, etc.
An alternative solution using a single loop would be to flatten the axes array:
for idx, feature in enumerate(df.columns[:-1]):
df.plot(feature, "cnt", ... , ax=axes.flatten()[idx])
or maybe most pythonic
for ax, feature in zip(axes.flatten(), df.columns[:-1]):
df.plot(feature, "cnt", ... , ax=ax)
Upvotes: 2
Reputation: 131710
The axes
object in your code is a 2D Numpy array of matplotlib Axes
objects. Since the call to subplots()
asked for 3 rows and 4 columns, the array will be 3 by 4. Indexing into the array like axes[r, c]
gives you the Axes
object that corresponds to row r
and column c
, and you can pass that object as the ax
keyword argument to a plotting method to make the plot show up on that axis. E.g. if you wanted to plot something in the second row and second column, you would call plot(..., ax=axes[1,1])
.
The code uses [idx / 4, idx % 4]
as a way of converting the indices (numbers from 0 to 11) into locations in the 3-by-4 grid. Try evaluating that expression yourself with idx
set to each value from 0 to 11 in turn, and you'll see how it works out.
Upvotes: 2