Daniel Gray
Daniel Gray

Reputation: 41

Unwanted white column in matplotlib -- how to remove?

here is python code (porting from Richard McElreath's excellent Statistical Rethinking) that results in an unwanted white trasparent 'column' in my resulting plot:

import numpy as np
import pandas as pd
import scipy.stats

import matplotlib.pyplot as plt

# import data
url = "https://raw.githubusercontent.com/pymc-devs/resources/master/Rethinking_2/Data/Howell1.csv"


df = pd.read_csv(url, delimiter = ';')
df2 = df[df.age >= 18]

# sample priors (prior predictive check)

n = 100
a = scipy.stats.norm.rvs(178, 20, n)
b1 = scipy.stats.norm.rvs(0, 10, n)
b2 = np.exp(scipy.stats.norm.rvs(0, 1, n))

xbar = df2.weight.mean()


# compare 2 priors

fig,ax = plt.subplots(1,2,sharey=True)

for i in range(100):
    ax[0].plot(df2.weight, a[i] + b1[i]*(df2.weight - xbar),color = 'grey',lw=.5,alpha=.2)
    ax[0].set_xlabel('weight')
    ax[0].set_ylabel('height')
    ax[0].set_title('normal prior of β')
    ax[1].plot(df2.weight, a[i] + b2[i]*(df2.weight - xbar),color = 'grey',lw=.5,alpha=.2)
    ax[1].set_xlabel('weight')
    ax[1].set_title('log-normal prior of β')

plt.axis([30,60,-100,400])
plt.show()

matplotlib output

This occurs in my Jupyter notebook, in Google CoLab and in the pdf (plt.savefig)

My notebook versions: numpy 1.19.4 pandas 1.1.5 scipy 1.5.4 matplotlib 3.3.3

Thanks!!

Upvotes: 4

Views: 204

Answers (3)

tacaswell
tacaswell

Reputation: 87546

This is a data artifact interacting with anti-aliasing in an interesting way. In the final image we have to pick a color for every pixel. Without anti-aliasing when we have to draw a line we have to decide is this pixel "in" the line (and hence we color it) or "out" (in which case we do not color it) which can lead to stair-step looking lines (particularly with lines that are close to flat). With anti-aliasing we color the pixel based on how much of the pixel is "in" the line vs not. That smearing out fools our eye (in a good way) and we see a more convincing straight line. Without anti-aliasing or alpha drawing the same line multiple times does not change the appearance (any given pixel is still in or out), but with anti-aliasing or alpha, every time you draw the line any of the "partial" pixels get darker.

In the original data the values in df2.weight all fall on the same line, but they are not sorted so as we draw it is going back-and-forth over the same path (see the trace in the left-center panel). Depending on exactly where the turning points are and how many times any given segment is traversed the line will look darker in someplaces than others. There is something in the exact structure of the data that is causing that "band".

If you increase the DPI, the pixels get smaller so the effect will get less pronounced (similar to zooming in) and if you turn of anti-aliasing the effect will get less pronounced. I suspect (but have not tested) if you shuffle the data you will be able to move the band around!

Sorting the weights (which from this context I do not think their order is meaningful?) makes the plots in the bottom two panels that look nicer.

So in short, that band is "real" in the sense that it is representing something in the data rather than being a bug in the render process, but is highlighting structure in the data that I do not think is meaningful.

import numpy as np
import pandas as pd
import scipy.stats

import matplotlib.pyplot as plt

# import data
url = "https://raw.githubusercontent.com/pymc-devs/resources/master/Rethinking_2/Data/Howell1.csv"

# this is a mpl 3.3 feature
fig, ad = plt.subplot_mosaic(
    [
        ["normal", "log-normal"],
        ["trace", "hist"],
        ["sorted normal", "sorted log-normal"],
    ],
    constrained_layout=True,
)

df = pd.read_csv(url, delimiter=";")
df2 = df[df.age >= 18]
# sample priors (prior predictive check)

n = 100
a = scipy.stats.norm.rvs(178, 20, n)
b1 = scipy.stats.norm.rvs(0, 10, n)
b2 = np.exp(scipy.stats.norm.rvs(0, 1, n))


def inner(weights, a, b1, b2, ax_dict):
    xbar = np.mean(weights)
    for i in range(100):
        ax_dict["normal"].plot(
            weights, a[i] + b1[i] * (weights - xbar), color="grey", lw=0.5, alpha=0.2
        )
        ax_dict["normal"].set_xlabel("weight")
        ax_dict["normal"].set_ylabel("height")
        ax_dict["normal"].set_title("normal prior of β")
        ax_dict["log-normal"].plot(
            weights, a[i] + b2[i] * (weights - xbar), color="grey", lw=0.5, alpha=0.2
        )
        ax_dict["log-normal"].set_xlabel("weight")
        ax_dict["log-normal"].set_title("log-normal prior of β")


inner(df2.weight, a, b1, b2, ad)
inner(
    np.array(sorted(df2.weight)),
    a,
    b1,
    b2,
    {"normal": ad["sorted normal"], "log-normal": ad["sorted log-normal"]},
)

ad["hist"].hist(df2.weight, bins=100, color="0.5")
ad["hist"].set_xlabel("weight")
ad["hist"].set_ylabel("#")
ad["trace"].plot(df2.weight, "-o", color="0.5", alpha=0.5)
ad["trace"].set_ylabel("weight")
ad["trace"].set_xlabel("index")
plt.show()

enter image description here

Upvotes: 1

TheMultiplexer
TheMultiplexer

Reputation: 325

I think you mean the region where the lines are drawn thinner/lighter and not the borders.

I found out it has to do with aliasing and not the data itself.

Play around with the antialiased parameter:

ax[0].plot(..., antialiased=False)

Looks like this:

enter image description here

Surely it makes the plot look ugly but you may increase the figure size or dpi parameter.

fig.set_dpi(300.0)
...
plt.show();

Then you get this:

enter image description here

Upvotes: 3

r-beginners
r-beginners

Reputation: 35230

I set the values manually, but if you set the axes for each graph, the margins will disappear.

ax[0].axis([33,63,-100,400])
ax[1].axis([33,60,-100,400])

enter image description here

If you want to make the spacing between the graphs narrower, you can do so in the following way.

fig.subplots_adjust(wspace=0.05)

enter image description here

Upvotes: 0

Related Questions