rhombuzz
rhombuzz

Reputation: 97

Scatterplot of pandas DataFrame ends in KeyError: 0

After I updated pandas (0.23.4) and matplotlib (3.01) I get a strange error trying to do something like the following:

import pandas as pd
import matplotlib.pyplot as plt


clrdict = {1: "#a6cee3", 2: "#1f78b4", 3: "#b2df8a", 4: "#33a02c"}

df_full = pd.DataFrame({'x':[20,30,30,40],
                        'y':[25,20,30,25],
                        's':[100,200,300,400],
                        'l':[1,2,3,4]})

df_full['c'] = df_full['l'].replace(clrdict)

df_part = df_full[(df_full.x == 30)]

fig = plt.figure()
plt.scatter(x=df_full['x'],
            y=df_full['y'],
            s=df_full['s'],
            c=df_full['c'])
plt.show()

fig = plt.figure()
plt.scatter(x=df_part['x'],
            y=df_part['y'],
            s=df_part['s'],
            c=df_part['c'])
plt.show()

The scatterplot of the original DataFrame (df_full) is shown without problems. But the plot of the partially DataFrame raises the following error:

Traceback (most recent call last):
  File "G:\data\project\test.py", line 27, in <module>
    c=df_part['c'])
  File "C:\Program Files\Python37\lib\site-packages\matplotlib\pyplot.py", line 2864, in scatter
    is not None else {}), **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\matplotlib\__init__.py", line 1805, in inner
    return func(ax, *args, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\matplotlib\axes\_axes.py", line 4195, in scatter
    isinstance(c[0], str))):
  File "C:\Program Files\Python37\lib\site-packages\pandas\core\series.py", line 767, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\Program Files\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3118, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

This is due to the color-option c=df_part['c']. When you leave it out – the problem doesn't occur. This hasn't happend before the updates, so maybe you're not able to reproduce this with lower versions of matplotlib or pandas (I have no idea which one causes it).

In my project the df_part = df_full[(df_full.x == i)] line is used within the update-function of a matplotlib.animation.FuncAnimation. The result is an animation over the values of x (which are timestamps in my project). So I need a way to part the DataFrame.

Upvotes: 1

Views: 1037

Answers (1)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339260

This is a bug which got fixed by https://github.com/matplotlib/matplotlib/pull/12673.

It should hopefully be available in the next bugfix release 3.0.2, which should be up within the next days.

In the meantime, you may use the numpy array from the pandas series, series.values.

Upvotes: 3

Related Questions