arahpanah
arahpanah

Reputation: 349

While doing data normalization, I always get ValueError: cannot convert float NaN to integer

I'm trying to normalize my CSV data by doing decimal scaling using this code

def decimal_scaling(data):
    data = np.array(data, dtype=np.float32)
    max_row = data.max(axis=0)
    c = np.array([len(str(int(number))) for number in np.abs(max_row)])
    return data/(10**c)

X = decimal_scaling(
            glcm_df[['dissimilarity_0', 'dissimilarity_45', 'dissimilarity_90', 'dissimilarity_135', 
                     'correlation_0', 'correlation_45', 'correlation_90', 'correlation_135', 
                     'homogeneity_0', 'homogeneity_45', 'homogeneity_90', 'homogeneity_135', 
                     'contrast_0', 'contrast_45', 'contrast_90', 'contrast_135', 
                     'ASM_0', 'ASM_45', 'ASM_90', 'ASM_135',
                     'energy_0', 'energy_45', 'energy_90', 'energy_135']].values)

But, everytime I run it, I always get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-5b1233475b8c> in <module>
     22                      'contrast_0', 'contrast_45', 'contrast_90', 'contrast_135',
     23                      'ASM_0', 'ASM_45', 'ASM_90', 'ASM_135',
---> 24                      'energy_0', 'energy_45', 'energy_90', 'energy_135']].values)

<ipython-input-21-5b1233475b8c> in decimal_scaling(data)
     13     data = np.array(data, dtype=np.float32)
     14     max_row = data.max(axis=0)
---> 15     c = np.array([len(str(int(number))) for number in np.abs(max_row)])
     16     return data/(10**c)
     17 

<ipython-input-21-5b1233475b8c> in <listcomp>(.0)
     13     data = np.array(data, dtype=np.float32)
     14     max_row = data.max(axis=0)
---> 15     c = np.array([len(str(int(number))) for number in np.abs(max_row)])
     16     return data/(10**c)
     17 

ValueError: cannot convert float NaN to integer

I'm not sure what went wrong.

Upvotes: 1

Views: 121

Answers (1)

tom10
tom10

Reputation: 69192

Numpy floats allow NaN values but ints don't. So the NaN propagates through your float calculations until it hits the int conversion.

That is, you are reading the data which results in some NaN values, then max returns a NaN for these rows, and the same for abs also return NaN, then int() complains.

Try:

data = np.array(data, dtype=np.float32) # from your code
print(np.argwhere(np.isnan(data)))

to find where your NaN values are.

Upvotes: 1

Related Questions