Reputation: 97
After I updated pandas (0.23.4) and matplotlib (3.01) I get a strange error trying to do something like the following:
import pandas as pd
import matplotlib.pyplot as plt
clrdict = {1: "#a6cee3", 2: "#1f78b4", 3: "#b2df8a", 4: "#33a02c"}
df_full = pd.DataFrame({'x':[20,30,30,40],
'y':[25,20,30,25],
's':[100,200,300,400],
'l':[1,2,3,4]})
df_full['c'] = df_full['l'].replace(clrdict)
df_part = df_full[(df_full.x == 30)]
fig = plt.figure()
plt.scatter(x=df_full['x'],
y=df_full['y'],
s=df_full['s'],
c=df_full['c'])
plt.show()
fig = plt.figure()
plt.scatter(x=df_part['x'],
y=df_part['y'],
s=df_part['s'],
c=df_part['c'])
plt.show()
The scatterplot of the original DataFrame (df_full) is shown without problems. But the plot of the partially DataFrame raises the following error:
Traceback (most recent call last):
File "G:\data\project\test.py", line 27, in <module>
c=df_part['c'])
File "C:\Program Files\Python37\lib\site-packages\matplotlib\pyplot.py", line 2864, in scatter
is not None else {}), **kwargs)
File "C:\Program Files\Python37\lib\site-packages\matplotlib\__init__.py", line 1805, in inner
return func(ax, *args, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\matplotlib\axes\_axes.py", line 4195, in scatter
isinstance(c[0], str))):
File "C:\Program Files\Python37\lib\site-packages\pandas\core\series.py", line 767, in __getitem__
result = self.index.get_value(self, key)
File "C:\Program Files\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3118, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 114, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 958, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 964, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
This is due to the color-option c=df_part['c']
. When you leave it out – the problem doesn't occur. This hasn't happend before the updates, so maybe you're not able to reproduce this with lower versions of matplotlib or pandas (I have no idea which one causes it).
In my project the df_part = df_full[(df_full.x == i)]
line is used within the update-function of a matplotlib.animation.FuncAnimation
. The result is an animation over the values of x (which are timestamps in my project). So I need a way to part the DataFrame.
Upvotes: 1
Views: 1037
Reputation: 339260
This is a bug which got fixed by https://github.com/matplotlib/matplotlib/pull/12673.
It should hopefully be available in the next bugfix release 3.0.2, which should be up within the next days.
In the meantime, you may use the numpy array from the pandas series, series.values
.
Upvotes: 3