erm
erm

Reputation: 41

TypeError: unhashable type: 'numpy.ndarray' when trying to create scatter plot from dataset

I am trying to create a scatter plot using a dataset on movies. The goal is to look at the correlation between the different categories and the target variable, whether or not the movie won an award. I have tried doing a type call on my variables, and neither of them appear to be of type numpy.ndarray as they are both pandas dataframes, yet I still get the following error when I try to create a scatter plot:

TypeError: unhashable type: 'numpy.ndarray'

My code is as follows:

import pandas as pd
import matplotlib.pyplot as plt

file=pd.read_csv('academy_awards.csv',sep=',',error_bad_lines=False,encoding="ISO 8859-1")
print(file)
df=pd.DataFrame(file)

#df=df.dropna(axis=0,how='any')
target=df.Category
X=pd.DataFrame(df.Won)

y=target
#print(type(X))
#print(type(y))

plt.scatter(X,y)

The following are the first 5 lines of the dataset I am using:

Year,Category,Nominee,Additional Info,Won
2010 (83rd),Actor -- Leading Role,Javier Bardem,Biutiful 
{'Uxbal'},NO
2010 (83rd),Actor -- Leading Role,Jeff Bridges,True Grit {'Rooster 
Cogburn'},NO
2010 (83rd),Actor -- Leading Role,Jesse Eisenberg,The Social 
Network {'Mark Zuckerberg'},NO
2010 (83rd),Actor -- Leading Role,Colin Firth,The King's Speech 
{'King George VI'},YES
2010 (83rd),Actor -- Leading Role,James Franco,127 Hours {'Aron 
Ralston'},NO
2010 (83rd),Actor -- Supporting Role,Christian Bale,The Fighter 
{'Dicky Eklund'},YES

Any help or suggestions are greatly appreciated!

Edit: The following is the full traceback--

-----------------------------------------------------------------------
TypeError                                 Traceback (most recent call 
last)
<ipython-input-211-efcb7c41bca1> in <module>
     14 print(y.shape)
     15 
---> 16 plt.scatter(X,y)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/pyplot.py in scatter(x, y, s, c, marker, cmap, 
norm, vmin, vmax, alpha, linewidths, verts, edgecolors, data, **kwargs)
   2862         vmin=vmin, vmax=vmax, alpha=alpha, 
linewidths=linewidths,
   2863         verts=verts, edgecolors=edgecolors, **({"data": data} 
if data
-> 2864         is not None else {}), **kwargs)
   2865     sci(__ret)
   2866     return __ret

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1808                         "the Matplotlib list!)" % (label_namer, 
func.__name__),
   1809                         RuntimeWarning, stacklevel=2)
-> 1810             return func(ax, *args, **kwargs)
   1811 
   1812         inner.__doc__ = _add_data_doc(inner.__doc__,

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, 
cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4170             edgecolors = 'face'
   4171 
-> 4172         self._process_unit_info(xdata=x, ydata=y, 
kwargs=kwargs)
   4173         x = self.convert_xunits(x)
   4174         y = self.convert_yunits(y)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_base.py in _process_unit_info(self, xdata, 
ydata, kwargs)
   2133             return kwargs
   2134 
-> 2135         kwargs = _process_single_axis(xdata, self.xaxis, 
'xunits', kwargs)
   2136         kwargs = _process_single_axis(ydata, self.yaxis, 
'yunits', kwargs)
   2137         return kwargs

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_base.py in _process_single_axis(data, axis, 
unit_name, kwargs)
   2116                 # We only need to update if there is nothing 
set yet.
   2117                 if not axis.have_units():
-> 2118                     axis.update_units(data)
   2119 
   2120             # Check for units in the kwargs, and if present 
update axis

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axis.py in update_units(self, data)
   1471         neednew = self.converter != converter
   1472         self.converter = converter
-> 1473         default = self.converter.default_units(data, self)
   1474         if default is not None and self.units is None:
   1475             self.set_units(default)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in default_units(data, axis)
    101         # default_units->axis_info->convert
    102         if axis.units is None:
--> 103             axis.set_units(UnitData(data))
    104         else:
    105             axis.units.update(data)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in __init__(self, data)
    167         self._counter = itertools.count()
    168         if data is not None:
--> 169             self.update(data)
    170 
    171     def update(self, data):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in update(self, data)
    184         data = np.atleast_1d(np.array(data, dtype=object))
    185 
--> 186         for val in OrderedDict.fromkeys(data):
    187             if not isinstance(val, (str, bytes)):
    188                 raise TypeError("{val!r} is not a 
string".format(val=val))

TypeError: unhashable type: 'numpy.ndarray'

Upvotes: 1

Views: 5803

Answers (2)

Kenry Sanchez
Kenry Sanchez

Reputation: 1743

First, you don't need to: df=pd.DataFrame(file). After opening the CSV file with pandas and saved in the file variable, you already get the data as dataFrame.

Then, you can easily call the scatter and choose the x-axis and y-axis with

df.plot(kind ="scatter", x= "Won", y = "Category")

You don't need to preprocess the data, because of it's already preprocessed after opened the file with pandas.

Upvotes: 1

Alec
Alec

Reputation: 9575

Arrays are unhashable because they're mutable. You can hash it by converting it to an immutable tuple (by wrapping it with tuple()) but you usually shouldn't be trying to hash arrays anyways. Your data is probably of the wrong shape.

Upvotes: 1

Related Questions