Reputation: 805
I'm trying to create quartile groups of a variable in a new variable. I'm getting an error message and I'm not sure why.
I wrote:
df.describe().popularity
count 10865.000000
mean 0.646446
std 1.000231
min 0.000065
25% 0.207575
50% 0.383831
75% 0.713857
max 32.985763
Name: popularity, dtype: float64
Then:
bin_edges = ['0.000065', '0.207575','0.383831','0.713857','32.985763']
bin_names = ['low','mod_low','medium','high']
df['popularity_levels']= pd.cut(df['popularity'], bin_edges, labels=bin_names)
df.head()
I'm getting the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-b6e8c834de1b> in <module>()
----> 1 df['popularity_levels']= pd.cut(df['popularity'], bin_edges, labels=bin_names)
2 df.head()
/opt/conda/lib/python3.6/site-packages/pandas/core/reshape/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest)
128 bins = np.asarray(bins)
129 bins = _convert_bin_to_numeric_type(bins, dtype)
--> 130 if (np.diff(bins) < 0).any():
131 raise ValueError('bins must increase monotonically.')
132
/opt/conda/lib/python3.6/site-packages/numpy/lib/function_base.py in diff(a, n, axis)
1766 return diff(a[slice1]-a[slice2], n-1, axis=axis)
1767 else:
-> 1768 return a[slice1]-a[slice2]
1769
1770
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U9') dtype('<U9') dtype('<U9')
What does the error mean? I think it may have to do with defining the data type of the new variable as a float... Is that right How can I fix it?
Upvotes: 1
Views: 3038
Reputation: 375495
The bin_edges
should be floats:
bin_edges = ['0.000065', '0.207575','0.383831','0.713857','32.985763']
# should instead be
bin_edges = [0.000065, 0.207575, 0.383831, 0.713857, 32.985763]
The error occurs since this list is converted to a numpy array:
In [11]: np.array(['0.000065', '0.207575','0.383831','0.713857','32.985763'])
Out[11]:
array(['0.000065', '0.207575', '0.383831', '0.713857', '32.985763'],
dtype='<U9')
(Here dtype='<U9'
means 9 character unicode.)
In [12]: np.array(['0.000065', '0.207575','0.383831','0.713857','32.985763']) - 1
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U9') dtype('<U9') dtype('<U9')
Upvotes: 2