pjw
pjw

Reputation: 2345

Matplotlib boxplot show only max and min fliers

I am making standard Matplotlib boxplots using the plt.boxplot() command. My line of code that creates the boxplot is:

bp = plt.boxplot(data, whis=[5, 95], showfliers=True)

Because my data has a large distribution, I am getting a lot of fliers outside the range of the whiskers. To get a cleaner publication quality plot, I would like to only show single fliers at the max. and at the min. values of the data, instead of all fliers. Is this possible? I don't see any built-in options in the documentation to do this.

(I can set the range of the whiskers to max/min, but this is not what I want. I would like to keep the whiskers at the 5th and 95th percentiles).

Below is the figure I am working on. Notice the density of fliers. Boxplots

Upvotes: 9

Views: 14764

Answers (2)

Geotob
Geotob

Reputation: 2945

plt.boxplot() returns a dictionary, where the key fliers contains the upper and lower fliers as line2d objects. You can manipulate them before plotting like this:

Only on matplotlib >= 1.4.0

bp = plt.boxplot(data, whis=[5, 95], showfliers=True)

# Get a list of Line2D objects, representing a single line from the
# minimum to the maximum flier points.
fliers = bp['fliers']

# Iterate over it!
for fly in fliers:
    fdata = fly.get_data()
    fly.set_data([fdata[0][0],fdata[0][-1]],[fdata[1][0],fdata[1][-1]])

On older versions

If you are on an older version of matplotlib, the fliers for each boxplot are represented by two lines, not one. Thus, the loop would look something like this:

import numpy as np
for i in range(len(fliers)):
    fdata = fliers[i].get_data()
    # Get the index of the maximum y in data if 
    # i is 0 or even, else get index of minimum y.
    if i%2 == 0:
        id = np.where(fdata[1] == fdata[1].max())[0][0]
    else:
        id = np.where(fdata[1] == fdata[1].min())[0][0]
    fliers[i].set_data([fdata[0][id], fdata[1][id]])

Also note that the showfliers argument doesn't exist in matplotlib <1.4x and the whisk argument doesn't accept lists.

Of course (for simple applications) you could plot the boxplot without fliers and add the max and min points to the plot:

bp = plt.boxplot(data, whis=[5, 95], showfliers=False)
sc = plt.scatter([1, 1], [data.min(), data.max()])

where [1, 1] is the x-position of the points.

Upvotes: 5

pjw
pjw

Reputation: 2345

fliers = bp['fliers'] 
for i in range(len(fliers)): # iterate through the Line2D objects for the fliers for each boxplot
    box = fliers[i] # this accesses the x and y vectors for the fliers for each box 
    box.set_data([[box.get_xdata()[0],box.get_xdata()[0]],[np.min(box.get_ydata()),‌​np.max(box.get_ydata())]]) 
    # note that you can use any two values from the xdata vector

Resulting figure, showing only max and min fliers: enter image description here

Upvotes: 3

Related Questions