BhishanPoudel
BhishanPoudel

Reputation: 17154

How to show label names in pandas groupby histogram plot

I can plot multiple histograms in a single plot using pandas but there are few things missing:

  1. How to give the label.
  2. I can only plot one figure, how to change it to layout=(3,1) or something else.
  3. Also, in figure 1, all the bins are filled with solid colors, and its kind of difficult to know which is which, how to fill then with different markers (eg. crosses,slashes,etc)?

Here is the MWE:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('iris')

df.groupby('species')['sepal_length'].hist(alpha=0.7,label='species')
plt.legend()

Output: enter image description here

To change layout I can use by keyword, but can't give them colors

HOW TO GIVE DIFFERENT COLORS?

df.hist('sepal_length',by='species',layout=(3,1))
plt.tight_layout()

Gives: enter image description here

Upvotes: 4

Views: 7818

Answers (3)

ajc327
ajc327

Reputation: 49

In pandas version 1.1.0 you can simply set the legend keyword to true.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('iris')

df.groupby('species')['sepal_length'].hist(alpha=0.7, legend = True)

output image

Upvotes: 4

Quang Hoang
Quang Hoang

Reputation: 150735

You can resolve to groupby:

fig,ax = plt.subplots()

hatches = ('\\', '//', '..')         # fill pattern
for (i, d),hatch in zip(df.groupby('species'), hatches):
    d['sepal_length'].hist(alpha=0.7, ax=ax, label=i, hatch=hatch)

ax.legend()

Output:

enter image description here

Upvotes: 7

ALollz
ALollz

Reputation: 59539

It's more code, but using pure matplotlib will always give you more control over the plots. For your second case:

import matplotlib.pyplot as plt
import numpy as np
from itertools import zip_longest

# Dictionary of color for each species
color_d = dict(zip_longest(df.species.unique(), 
                           plt.rcParams['axes.prop_cycle'].by_key()['color']))

# Use the same bins for each
xmin = df.sepal_length.min()
xmax = df.sepal_length.max()
bins = np.linspace(xmin, xmax, 20)

# Set up correct number of subplots, space them out. 
fig, ax = plt.subplots(nrows=df.species.nunique(), figsize=(4,8))
plt.subplots_adjust(hspace=0.4)

for i, (lab, gp) in enumerate(df.groupby('species')):
    ax[i].hist(gp.sepal_length, ec='k', bins=bins, color=color_d[lab])
    ax[i].set_title(lab)

    # same xlim for each so we can see differences
    ax[i].set_xlim(xmin, xmax)

enter image description here

Upvotes: 1

Related Questions