Reputation: 4877
Is there a way to sort the x-axis for a grouped box plot in pandas? It seems like it is sorted by an ascending order and I would like it to be ordered based on some other column value.
Upvotes: 4
Views: 2143
Reputation: 248
Using the solution posted by krieger, the short answer is to convert the category column to a CategoricalDtype like so:
ordered_list = ['dog', 'cat', 'mouse']
df['category'] = df['category'].astype(pd.CategoricalDtype(ordered_list , ordered=True))
Upvotes: 2
Reputation: 51
If you're grouping by a category, set it as an ordered categorical in the desired order.
See example below: Here a dataset is created with three categories A, B and C where the mean value of each category is of the order C, B, A. The goal is to plot the categories in order of their mean value.
The key is converting the category to an ordered categorical data type with the desired order.
# create some data
n = 50
a = pd.concat([pd.Series(['A']*n, name='cat'),
pd.Series(np.random.normal(1, 1, n), name='val')],
axis=1)
b = pd.concat([pd.Series(['B']*n, name='cat'),
pd.Series(np.random.normal(.5, 1, n), name='val')],
axis=1)
c = pd.concat([pd.Series(['C']*n, name='cat'),
pd.Series(np.random.normal(0, 1, n), name='val')],
axis=1)
df = pd.concat([a, b, c]).reset_index(drop=True)
# unordered boxplot
df.boxplot(column='val', by='cat')
# get order by mean
means = df.groupby(['cat'])['val'].agg(np.mean).sort_values()
ordered_cats = means.index.values
# create categorical data type and set categorical column as new data type
cat_dtype = pd.CategoricalDtype(ordered_cats, ordered=True)
df['cat'] = df['cat'].astype(cat_dtype)
# ordered boxplot
df.boxplot(column='val', by='cat')
Upvotes: 4