Reputation: 349
I'm trying to normalize my CSV data by doing decimal scaling using this code
def decimal_scaling(data):
data = np.array(data, dtype=np.float32)
max_row = data.max(axis=0)
c = np.array([len(str(int(number))) for number in np.abs(max_row)])
return data/(10**c)
X = decimal_scaling(
glcm_df[['dissimilarity_0', 'dissimilarity_45', 'dissimilarity_90', 'dissimilarity_135',
'correlation_0', 'correlation_45', 'correlation_90', 'correlation_135',
'homogeneity_0', 'homogeneity_45', 'homogeneity_90', 'homogeneity_135',
'contrast_0', 'contrast_45', 'contrast_90', 'contrast_135',
'ASM_0', 'ASM_45', 'ASM_90', 'ASM_135',
'energy_0', 'energy_45', 'energy_90', 'energy_135']].values)
But, everytime I run it, I always get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-5b1233475b8c> in <module>
22 'contrast_0', 'contrast_45', 'contrast_90', 'contrast_135',
23 'ASM_0', 'ASM_45', 'ASM_90', 'ASM_135',
---> 24 'energy_0', 'energy_45', 'energy_90', 'energy_135']].values)
<ipython-input-21-5b1233475b8c> in decimal_scaling(data)
13 data = np.array(data, dtype=np.float32)
14 max_row = data.max(axis=0)
---> 15 c = np.array([len(str(int(number))) for number in np.abs(max_row)])
16 return data/(10**c)
17
<ipython-input-21-5b1233475b8c> in <listcomp>(.0)
13 data = np.array(data, dtype=np.float32)
14 max_row = data.max(axis=0)
---> 15 c = np.array([len(str(int(number))) for number in np.abs(max_row)])
16 return data/(10**c)
17
ValueError: cannot convert float NaN to integer
I'm not sure what went wrong.
Upvotes: 1
Views: 121
Reputation: 69192
Numpy floats
allow NaN
values but int
s don't. So the NaN propagates through your float calculations until it hits the int
conversion.
That is, you are reading the data
which results in some NaN values, then max
returns a NaN for these rows, and the same for abs
also return NaN, then int()
complains.
Try:
data = np.array(data, dtype=np.float32) # from your code
print(np.argwhere(np.isnan(data)))
to find where your NaN values are.
Upvotes: 1