Paul Gowder
Paul Gowder

Reputation: 2539

matplotlib fill_between leaves gaps

I'm trying to recreate seaborn's fill-only confidence interval plotting in raw matplotlib. In doing so, I'm running into strange behavior where the fill_between function leaves gaps between the stuff it's supposed to be filling.

I'm using real-world data on this, but it's well-behaved data: the x values are on the range of about 0-15, and the y values on a range of about 25-85. I'm using statsmodels to fit the line and generate the confidence intervals with essentially the code from this prior SO, and the fitted values as well as the upper and lower bounds of the confidence intervals are as they should be (the ranges are appropriate, etc.). So there's nothing wrong with the data.

Here's the relevant part of the code:

def make_plot(x, y):
    fig = plt.figure(figsize=(12, 9))
    ax = fig.add_subplot(1, 1, 1)
    ax.plot(x, y, 'k.', ms=5)
    ax.locator_params(nbins=3)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    regline =  sm.OLS(y,sm.add_constant(x)).fit()
    fitted = regline.fittedvalues
    ax.plot(x, fitted, color=(0.2, 0.2, 0.2, 0.2), linewidth=2)
    ci_low, ci_high = get_ci_values(regline)
    ax.fill_between(x, ci_low, fitted, facecolor=(0.4, 0.4, 0.9, 0.2))
    ax.fill_between(x, ci_high, fitted, facecolor=(0.9, 0.4, 0.4, 0.2))
    return fig

The line fill works fine until it hits around x=10, y=50, and then it starts to leave bizarre gaps where it doesn't come all the way to the regression line. Here's an example:

image with horrible gap

What have I done wrong here? I've tried a bunch of stuff, including:

but none of that makes any difference.

I also note that seaborn manages to make its beautiful fills using fill_between, using exactly the same strategy, and seaborn's plotting works correctly on the data I'm using...

Upvotes: 1

Views: 1479

Answers (1)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339795

One cannot know for sure because the question is missing the essential part, namely the data itself (see Minimal, Complete, and Verifiable example).

The strong suspicion here would however be that the data is not sorted.

The (untested) solution would be to sort the data,

ax.plot(np.sort(x), fitted[np.argsort(x)])
ax.fill_between(np.sort(x), ci_low[np.argsort(x)], fitted[np.argsort(x)])

To understand why values need to be sorted, maybe a picture can tell more than a thousands words.

enter image description here

Upvotes: 4

Related Questions