Reputation: 3753

Why converting np.nan to int results in huge number?

I have a numpy array like this below:

array([['18.0', '11.0', '5.0', ..., '19.0', '18.0', '20.0'],
       ['11.0', '14.0', '15.0', ..., '45.0', '26.0', '20.0'],
       ['1.0', '0.0', '1.0', ..., '3.0', '4.0', '17.0'],
       ...,
       ['nan', 'nan', 'nan', ..., 'nan', 'nan', 'nan'],
       ['nan', 'nan', 'nan', ..., 'nan', 'nan', 'nan'],
       ['nan', 'nan', 'nan', ..., 'nan', 'nan', 'nan']],
      dtype='|S230')

But converting it to int array makes the np.nan value to be weird values:

df[:,4:].astype('float').astype('int')

array([[                  18,                   11,                    5,
        ...,                   19,                   18,
                          20],
       [                  11,                   14,                   15,
        ...,                   45,                   26,
                          20],
       [                   1,                    0,                    1,
        ...,                    3,                    4,
                          17],
       ...,
       [-9223372036854775808, -9223372036854775808, -9223372036854775808,
        ..., -9223372036854775808, -9223372036854775808,
        -9223372036854775808],
       [-9223372036854775808, -9223372036854775808, -9223372036854775808,
        ..., -9223372036854775808, -9223372036854775808,
        -9223372036854775808],
       [-9223372036854775808, -9223372036854775808, -9223372036854775808,
        ..., -9223372036854775808, -9223372036854775808,
        -9223372036854775808]])

So how to fix my problem ?

Upvotes: 2

Answers (2)

Ofer Sadan

Reputation: 11972

It all depends what you expect the result to be. nan is of a float type, so converting the string 'nan' into float is no problem. But there is no definition of converting it to int values.

I suggest you handle it differently - first choose what spcific int you want all the nan values to become (for example 0), and only then convert the whole array to int

a = np.array(['1','2','3','nan','nan'])
a[a=='nan'] = 0 # this will convert all the nan values to 0, or choose another number
a = a.astype('int')

Now a is equal to

array([1, 2, 3, 0, 0])

Upvotes: 1

juanpa.arrivillaga

Reputation: 96324

Converting floating-point Nan to an integer type is undefined behavior, as far as I know. The number:

-9223372036854775808

Is the smallest int64, i.e. -2**63. Note the same thing happens on my system when I coerce to int32:

>>> arr
array([['18.0', '11.0', '5.0', 'nan']],
      dtype='<U4')
>>> arr.astype('float').astype(np.int32)
array([[         18,          11,           5, -2147483648]], dtype=int32)
>>> -2**31
-2147483648

Upvotes: 2

Why converting np.nan to int results in huge number?

Answers (2)

Related Questions