Reputation: 19
I've been trying my hand at some data visualization with Python and Matplot. In this case I'm trying to visualize the amount of data missing per column. I ran a short script to find all the missing values per column and the result in the array missing_count. I now would like to show this in a bar chart using Matplot but I've run into this issue:
import matplotlib.pyplot as plt
import numpy as np
missing_count = np.array([33597, 0, 0, 0, 0, 0, 0, 12349, 0, 0, 12349, 0, 0, 0, 115946, 47696, 44069, 81604, 5416, 5416, 5416, 5416, 0, 73641, 74331, 187204, 128829, 184118, 116441, 183093, 153048, 187349, 89918, 89918, 89918, 89918, 89918, 89918, 51096, 51096, 51096, 51096, 51096, 51096, 51096, 51096, 51096, 51096])
n = len(missing_count)
index = np.arange(n)
fig, ax = plt.subplots()
r1 = ax.bar(index, n, 0.15, missing_count, color='r')
ax.set_ylabel('NULL values')
ax.set_title('Amount of NULL values per colum')
ax.set_xticks(index + width / 2)
ax.set_xticklabels(list(originalData.columns.values))
plt.show()
Resulting in this error:
ValueError Traceback (most recent call last)
<ipython-input-34-285ca1e9de68> in <module>()
10 fig, ax = plt.subplots()
11
---> 12 r1 = ax.bar(index, n, 0.15, missing_count, color='r')
13
14 ax.set_ylabel('NULL values')
C:\Users\Martien\Anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1895 warnings.warn(msg % (label_namer, func.__name__),
1896 RuntimeWarning, stacklevel=2)
-> 1897 return func(ax, *args, **kwargs)
1898 pre_doc = inner.__doc__
1899 if pre_doc is None:
C:\Users\Martien\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in bar(self, left, height, width, bottom, **kwargs)
2077 if len(height) != nbars:
2078 raise ValueError("incompatible sizes: argument 'height' "
-> 2079 "must be length %d or scalar" % nbars)
2080 if len(width) != nbars:
2081 raise ValueError("incompatible sizes: argument 'width' "
ValueError: incompatible sizes: argument 'height' must be length 48 or scalar
I've looked at a the Matplot documentation which tells me that height should be a scalar, but it does not reference or explain what this scalar is. There is also this example I've followed which does work when I run it.
I've run out of ideas as to why I get this error, all help would really be appreciated.
Edit: originalData is the original CSV file I read in, I only use it here to name my bars
Upvotes: 0
Views: 409
Reputation: 1980
so, according to https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.bar.html
the second argument must be height
you're inputting n
as the second argument which is a single number
try
r1 = ax.bar(index, missing_count, 0.15, color='r')
instead, which should get the job done.
Even better, be explicit about your argument names (tedious, and harder to keep clean, but a good ides when you have more than a few arguments)
r1 = ax.bar(x=index, height = missing_count, width = 0.15, color='r')
the second argument must be height; height corresponds to the count for any particular box. Say you had an array of zeros and ones
A = [0,0,0,0,1,1,1]
that would result in a bar plot with two bars, one would be 4 units high (since you have four zeros) the other would be 3 units high
the command
r1 = ax.bar([0,1], [4,3], 0.15, color='r')
would make a plot with a bar at zero and a bar at 1. The first bar would be 4 units high, the second would be 3 units high.
Translating to your code, missing_count
corresponds to the COUNT of the array
that's not A
, but instead [Counter([0,0,0,0,1,1,1])[x] for x in Counter([0,0,0,0,1,1,1])]
Upvotes: 2
Reputation: 339230
In the code n
is scalar. You probably do not want the bar height to be constant, but rather the values from missing_count
.
ax.bar(index, missing_count, 0.15, color='r')
Upvotes: 1