matplotlib fill_between leaves gaps

Question

I'm trying to recreate seaborn's fill-only confidence interval plotting in raw matplotlib. In doing so, I'm running into strange behavior where the fill_between function leaves gaps between the stuff it's supposed to be filling.

I'm using real-world data on this, but it's well-behaved data: the x values are on the range of about 0-15, and the y values on a range of about 25-85. I'm using statsmodels to fit the line and generate the confidence intervals with essentially the code from this prior SO, and the fitted values as well as the upper and lower bounds of the confidence intervals are as they should be (the ranges are appropriate, etc.). So there's nothing wrong with the data.

Here's the relevant part of the code:

def make_plot(x, y):
    fig = plt.figure(figsize=(12, 9))
    ax = fig.add_subplot(1, 1, 1)
    ax.plot(x, y, 'k.', ms=5)
    ax.locator_params(nbins=3)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    regline =  sm.OLS(y,sm.add_constant(x)).fit()
    fitted = regline.fittedvalues
    ax.plot(x, fitted, color=(0.2, 0.2, 0.2, 0.2), linewidth=2)
    ci_low, ci_high = get_ci_values(regline)
    ax.fill_between(x, ci_low, fitted, facecolor=(0.4, 0.4, 0.9, 0.2))
    ax.fill_between(x, ci_high, fitted, facecolor=(0.9, 0.4, 0.4, 0.2))
    return fig

The line fill works fine until it hits around x=10, y=50, and then it starts to leave bizarre gaps where it doesn't come all the way to the regression line. Here's an example:

What have I done wrong here? I've tried a bunch of stuff, including:

adding lines for the low and high confidence intervals
adding interpolate=True to the fill_between calls
adding where=x>0 to the fill_between calls

but none of that makes any difference.

I also note that seaborn manages to make its beautiful fills using fill_between, using exactly the same strategy, and seaborn's plotting works correctly on the data I'm using...

ImportanceOfBeingErnest · Accepted Answer

One cannot know for sure because the question is missing the essential part, namely the data itself (see Minimal, Complete, and Verifiable example).

The strong suspicion here would however be that the data is not sorted.

The (untested) solution would be to sort the data,

ax.plot(np.sort(x), fitted[np.argsort(x)])
ax.fill_between(np.sort(x), ci_low[np.argsort(x)], fitted[np.argsort(x)])

To understand why values need to be sorted, maybe a picture can tell more than a thousands words.

matplotlib fill_between leaves gaps

Answers (1)

Related Questions