Reputation: 6093
I'm attempting to make a violin plot with python 3.8.10 & matplotlib 3.3.4
import matplotlib.pyplot as plt
import numpy as np
data = []
data.append([65,46,64,59,42,44])
data.append([20,40,44,43,32,20,27,31,20,40,24,26,37,30,29,25,31,65,50,38,41,19,31,38,48,44,51,55,52,25,40,28,50,37,44,21,43,28,36,67,55,58,23,36,28,21,21,39,26,65,18,27,50,70,29,37,25,49,33,31,20,33])
f = plt.figure()
plt.rc('xtick', labelsize = 6)
violin_plot = plt.violinplot(data, showmeans=False, showmedians=False)
for pc in violin_plot["bodies"]:
pc.set_edgecolor('black')
def adjacent_values(vals, q1, q3):
upper_adjacent_value = q3 + (q3 - q1) * 1.5
upper_adjacent_value = np.clip(upper_adjacent_value, q3, vals[-1])
lower_adjacent_value = q1 - (q3 - q1) * 1.5
lower_adjacent_value = np.clip(lower_adjacent_value, vals[0], q1)
return lower_adjacent_value, upper_adjacent_value
quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
whiskers = np.array([
adjacent_values(sorted_array, q1, q3)
for sorted_array, q1, q3 in zip(data, quartile1, quartile3)])
whiskers_min, whiskers_max = whiskers[:, 0], whiskers[:, 1]
inds = np.arange(1, len(medians) + 1)
plt.scatter(inds, medians, marker="o", color="white", s=30, zorder=3)
plt.vlines(inds, quartile1, quartile3, color="k", linestyle="-", lw=5)
plt.vlines(inds, whiskers_min, whiskers_max, color="k", linestyle="-", lw=1)
plt.savefig('violin_age_by_race.svg', bbox_inches='tight', pad_inches = 0.05)
which I got from https://matplotlib.org/devdocs/gallery/statistics/customized_violin.html
but the code above generates an error (the line numbers are different than the code above because I trimmed the file down to make a minimal working example for StackOverflow)
/usr/local/lib/python3.8/dist-packages/numpy/core/_asarray.py:171: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return array(a, dtype, copy=False, order=order, subok=True)
Traceback (most recent call last):
File "/tmp/E2Woujgas1.py", line 35, in <module>
quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
File "<__array_function__ internals>", line 5, in percentile
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3818, in percentile
return _quantile_unchecked(
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked
r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3495, in _ureduce
axis = _nx.normalize_axis_tuple(axis, nd)
File "/usr/local/lib/python3.8/dist-packages/numpy/core/numeric.py", line 1391, in normalize_axis_tuple
axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis])
File "/usr/local/lib/python3.8/dist-packages/numpy/core/numeric.py", line 1391, in <listcomp>
axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis])
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
the error is in quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
so I do what the error message suggests, and change to
quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1, dtype = object)
but then I get an error:
TypeError: _percentile_dispatcher() got an unexpected keyword argument 'dtype'
as far as I can tell, the error is being thrown because the sub lists are different lengths, which is unavoidable. The example had all sub-lists with 100 elements.
I've also tried making an np array:
np_data = np.array(data, dtype = object)
quartile1, medians, quartile3 = np.percentile(np_data, [25, 50, 75], axis=1, dtype = object)
but the above changes give the same error about dtype
How can I alter this code so that numpy won't complain about different length sub-lists?
Upvotes: 0
Views: 382
Reputation: 231385
The error isn't in the violinplot
! That works just fine.
It's in the percentile
function.
In [23]: np.percentile(data, [25, 50, 75], axis=1)
/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:3539: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
a = np.asanyarray(a)
Traceback (most recent call last):
File "<ipython-input-23-32c56e5bfa18>", line 1, in <module>
np.percentile(data, [25, 50, 75], axis=1)
File "<__array_function__ internals>", line 5, in percentile
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3867, in percentile
return _quantile_unchecked(
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3986, in _quantile_unchecked
r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 3544, in _ureduce
axis = _nx.normalize_axis_tuple(axis, nd)
File "/usr/local/lib/python3.8/dist-packages/numpy/core/numeric.py", line 1385, in normalize_axis_tuple
axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis])
File "/usr/local/lib/python3.8/dist-packages/numpy/core/numeric.py", line 1385, in <listcomp>
axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis])
AxisError: axis 1 is out of bounds for array of dimension 1
data
is a list. percentile
needs an array, so:
In [25]: type(data)
Out[25]: list
In [26]: np.array(data)
<ipython-input-26-d04fee483c4a>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
np.array(data)
Out[26]:
array([list([65, 46, 64, 59, 42, 44]),
list([20, 40, 44, 43, 32, 20, 27, 31, 20, 40, 24, 26, 37, 30, 29, 25, 31, 65, 50, 38, 41, 19, 31, 38, 48, 44, 51, 55, 52, 25, 40, 28, 50, 37, 44, 21, 43, 28, 36, 67, 55, 58, 23, 36, 28, 21, 21, 39, 26, 65, 18, 27, 50, 70, 29, 37, 25, 49, 33, 31, 20, 33])],
dtype=object)
So you can make an array from data without the warning:
In [30]: np_data=np.array(data, dtype=object)
In [31]: np_data
Out[31]:
array([list([65, 46, 64, 59, 42, 44]),
list([20, 40, 44, 43, 32, 20, 27, 31, 20, 40, 24, 26, 37, 30, 29, 25, 31, 65, 50, 38, 41, 19, 31, 38, 48, 44, 51, 55, 52, 25, 40, 28, 50, 37, 44, 21, 43, 28, 36, 67, 55, 58, 23, 36, 28, 21, 21, 39, 26, 65, 18, 27, 50, 70, 29, 37, 25, 49, 33, 31, 20, 33])],
dtype=object)
But note, it is 1d, an array of lists. Specifying axis=1
is wrong because the array does not have such an axis.
Still, calling percentile on that array of lists still doesn't work:
In [32]: np.percentile(np_data, [25, 50, 75])
Traceback (most recent call last):
File "<ipython-input-32-31dd33e64b74>", line 1, in <module>
np.percentile(np_data, [25, 50, 75])
File "<__array_function__ internals>", line 5, in percentile
....
packages/numpy/lib/function_base.py", line 4009, in _lerp
diff_b_a = subtract(b, a)
TypeError: unsupported operand type(s) for -: 'list' and 'list'
You could do percentile
on the 2 lists separately:
In [34]: np.percentile(np_data[0], [25, 50, 75])
Out[34]: array([44.5 , 52.5 , 62.75])
In [35]: np.percentile(np_data[1], [25, 50, 75])
Out[35]: array([26.25, 34.5 , 44. ])
In [36]: np.percentile(data[1], [25, 50, 75])
Out[36]: array([26.25, 34.5 , 44. ])
Upvotes: 1