ScalaBoy
ScalaBoy

Reputation: 3392

How to sort box plot values in increasing order (by median values)?

This is my pandas DataFrame:

Area            Gender  Quantity
XXX             Men     115
XXX             Men     105    
XXX             Men     114
YYY             Men     100
YYY             Men     90    
YYY             Men     95
YYY             Men     101
XXX             Women   120    
XXX             Women   122
XXX             Women   115
XXX             Women   117    
YYY             Women   91
YYY             Women   90
YYY             Women   90

This is how I created a box plot.

import seaboard as sns
import matplotlib.pyplot as pat

fig, ax = plt.subplots(figsize=(15,11))
ax = sns.boxplot(x="Area", y="Quantity", hue="Gender", data=df, palette="Set3")

I want to sort the Area groups by median Quantity in the increasing order. How can I do it?

Upvotes: 1

Views: 2772

Answers (2)

skpro19
skpro19

Reputation: 519

You can pass an 'order' parameter in sns.boxplot function. See this - https://python-graph-gallery.com/35-control-order-of-boxplot/

Upvotes: 0

normanius
normanius

Reputation: 9762

This is not possible right away with current versions of seaborn (<=0.9.0). The best thing you can do at the moment is to set the hue_order (for instance: ['Woman', 'Men']), but it is applied to all groups likewise, which is not what you want.

Also, extending boxplot() is not that simple because seaborn does not expose the classes responsible for plotting in the official API. See here the entry point to boxplot() (permalink to seaborn master version as of 20.10.2018, git hash: 84ca6c6).

In case you are not afraid of working with the internal seaborn objects, you can create your own version of sorted_boxplot(). The possibly simplest way to achieve the ordering is to modify the following line in _BoxPlotter.draw_boxplot() (permalink, git: 84ca6c6):

# Original
center = i + offsets[j]

# Fix:
ordered_offsets = ...
center = i + ordered_offsets[j]

center refers to the position of the boxplot, i is the index of the group, and j is the index of the current hue. I tested this by deriving from _BoxPlotter and by overriding draw_boxplot(), see below for some code.

PS: Would be great if someone elaborates a bit more on this to suggest a pull request for seaborn. The feature certainly is useful.


The following works for me (python 3.6, seaborn 0.9.0):

import numpy as np
import seaborn as sns
from seaborn.categorical import _BoxPlotter
from seaborn.utils import remove_na

class SortedBoxPlotter(_BoxPlotter):
    def __init__(self, *args, **kwargs):
        super(SortedBoxPlotter, self).__init__(*args, **kwargs)

    def draw_boxplot(self, ax, kws):
        '''
        Below code has been copied partly from seaborn.categorical.py
        and is reproduced only for educational purposes.
        '''
        if self.plot_hues is None:
            # Sorting by hue doesn't apply here. Just
            return super(SortedBoxPlotter, self).draw_boxplot(ax, kws)

        vert = self.orient == "v"
        props = {}
        for obj in ["box", "whisker", "cap", "median", "flier"]:
            props[obj] = kws.pop(obj + "props", {})

        for i, group_data in enumerate(self.plot_data):

            # ==> Sort offsets by median
            offsets = self.hue_offsets
            medians = [ np.median(group_data[self.plot_hues[i] == h])
                        for h in self.hue_names ]
            offsets_sorted = offsets[np.argsort(medians)]

            # Draw nested groups of boxes
            for j, hue_level in enumerate(self.hue_names):

                # Add a legend for this hue level
                if not i:
                    self.add_legend_data(ax, self.colors[j], hue_level)

                # Handle case where there is data at this level
                if group_data.size == 0:
                    continue

                hue_mask = self.plot_hues[i] == hue_level
                box_data = remove_na(group_data[hue_mask])

                # Handle case where there is no non-null data
                if box_data.size == 0:
                    continue

                # ==> Fix ordering
                center = i + offsets_sorted[j]

                artist_dict = ax.boxplot(box_data,
                                         vert=vert,
                                         patch_artist=True,
                                         positions=[center],
                                         widths=self.nested_width,
                                         **kws)
                self.restyle_boxplot(artist_dict, self.colors[j], props)

def sorted_boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
                   orient=None, color=None, palette=None, saturation=.75,
                   width=.8, dodge=True, fliersize=5, linewidth=None,
                   whis=1.5, notch=False, ax=None, **kwargs):

    '''
    Same as sns.boxplot(), except that nested groups of boxes are plotted by
    increasing median.
    '''

    plotter = SortedBoxPlotter(x, y, hue, data, order, hue_order,
                               orient, color, palette, saturation,
                               width, dodge, fliersize, linewidth)
    if ax is None:
        ax = plt.gca()
    kwargs.update(dict(whis=whis, notch=notch))
    plotter.plot(ax, kwargs)
    return ax

To run with your sample data:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame([ ["XXX", "Men" ,  115],
                    ["XXX", "Men" ,  105    ],
                    ["XXX", "Men" ,  114],
                    ["YYY", "Men" ,  100],
                    ["YYY", "Men" ,  90    ],
                    ["YYY", "Men" ,  95],
                    ["YYY", "Men" ,  101],
                    ["XXX", "Women", 120    ],
                    ["XXX", "Women", 122],
                    ["XXX", "Women", 115],
                    ["XXX", "Women", 117    ],
                    ["YYY", "Women", 91],
                    ["YYY", "Women", 90],
                    ["YYY", "Women", 90]],
                  columns = ["Area", "Gender", "Quantity"])
sorted_boxplot(x="Area", y="Quantity", hue="Gender", data=df, palette="Set3")
plt.show()

Result:

enter image description here

Upvotes: 5

Related Questions