joelostblom
joelostblom

Reputation: 48909

Plotting error bars on grouped bars in pandas

I can plot error bars on single series barplots like so:

import pandas as pd
df = pd.DataFrame([[4,6,1,3], [5,7,5,2]], columns = ['mean1', 'mean2', 'std1', 'std2'], index=['A', 'B'])
print(df)
     mean1  mean2  std1  std2
A      4      6     1     3
B      5      7     5     2

df['mean1'].plot(kind='bar', yerr=df['std1'], alpha = 0.5,error_kw=dict(ecolor='k'))

enter image description here

As expected, the mean of index A is paired with the standard deviation of the same index, and the error bar shows the +/- of this value.

However, when I try to plot both 'mean1' and 'mean2' in the same plot I cannot use the standard deviations in the same way:

df[['mean1', 'mean2']].plot(kind='bar', yerr=df[['std1', 'std2']], alpha = 0.5,error_kw=dict(ecolor='k'))

    Traceback (most recent call last):

  File "<ipython-input-587-23614d88a3c5>", line 1, in <module>
    df[['mean1', 'mean2']].plot(kind='bar', yerr=df[['std1', 'std2']], alpha = 0.5,error_kw=dict(ecolor='k'))

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\tools\plotting.py", line 1705, in plot_frame
    plot_obj.generate()

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\tools\plotting.py", line 878, in generate
    self._make_plot()

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\tools\plotting.py", line 1534, in _make_plot
    start=start, label=label, **kwds)

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\tools\plotting.py", line 1481, in f
    return ax.bar(x, y, w, bottom=start,log=self.log, **kwds)

  File "C:\Users\nameDropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\matplotlib\axes.py", line 5075, in bar
    fmt=None, **error_kw)

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\matplotlib\axes.py", line 5749, in errorbar
    iterable(yerr[0]) and iterable(yerr[1])):

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\core\frame.py", line 1635, in __getitem__
    return self._getitem_column(key)

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\core\frame.py", line 1642, in _getitem_column
    return self._get_item_cache(key)

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\core\generic.py", line 983, in _get_item_cache
    values = self._data.get(item)

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\core\internals.py", line 2754, in get
    _, block = self._find_block(item)

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\core\internals.py", line 3065, in _find_block
    self._check_have(item)

  File "C:\Users\name\Dropbox\Tools\WinPython-64bit-2.7.6.2\python-2.7.6.amd64\lib\site-packages\pandas\core\internals.py", line 3072, in _check_have
    raise KeyError('no item named %s' % com.pprint_thing(item))

KeyError: u'no item named 0'

The closest I have gotten to my desired output is this:

df[['mean1', 'mean2']].plot(kind='bar', yerr=df[['std1', 'std2']].values.T, alpha = 0.5,error_kw=dict(ecolor='k'))

enter image description here

But now the error bars are not plotted symmetrically. Instead the green and blur bars in each series use the same positive and negative error and this is where I am stuck. How can I get the error bars of my multiseries barplot to have a similar appearance as when I had only one series?

Update: Seems like this is fixed in pandas 0.14, I was reading the docs for 0.13 earlier. I don't have the possibility to upgrade my pandas right now though. Will do later and see how it turns out.

Upvotes: 11

Views: 9837

Answers (1)

velodrome
velodrome

Reputation: 216

  • yerr=df[['std1', 'std2']] in the OP doesn't work, because the column names are not the same as for df[['mean1', 'mean2']]
  • Using df[['std1', 'std2']].to_numpy().T bypasses the issue by passing an error array without named columns
  • Tested in python 3.8.11, pandas 1.3.3, matplotlib 3.4.3
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame([[4,6,1,3], [5,7,5,2]], columns = ['mean1', 'mean2', 'std1', 'std2'], index=['A', 'B'])

   mean1  mean2  std1  std2
A      4      6     1     3
B      5      7     5     2

# convert the std columns to an array
yerr = df[['std1', 'std2']].to_numpy().T

# print(yerr)
array([[1, 5],
       [3, 2]], dtype=int64)

df[['mean1', 'mean2']].plot(kind='bar', yerr=yerr, alpha=0.5, error_kw=dict(ecolor='k'))
plt.show()

enter image description here

Upvotes: 15

Related Questions