mknote
mknote

Reputation: 177

Converting NaN to Integer

I need to convert the NaN value to an integer in Python (NumPy, specifically), which unfortunately throws an error. For those unfamiliar with the issue, here is a MWE showcasing it:

import numpy as np

test_data = [[2.3, 4], [1.1, np.nan]]

test_array = np.array(test_data, dtype=[("col1", float), ("col2", int)])

Running this code produces the error ValueError: cannot convert float NaN to integer. There have been questions regarding this previously, most notably here and here, but they only offer workarounds that aren't useful in my situation. Here's some solutions they've given along with a few I've thought of, along with the reason they don't work for me:

  1. Remove the rows that have NaN values. Unfortunately, the entire point of the table I'm creating is to present a list of targets, some of which lack data in certain parameters, so removing these rows defeats the whole purpose of making the table to begin with.
  2. Replace the NaN values with 0. Unfortunately, in several cases, this would actually imply things to the reader that are incorrect. Furthermore, the journal I'm submitting to requires such spaces to be blank, and using NaN is the only way I know of to produce a blank space in a numeric cell.
  3. Convert the columns in question to floats so that I can use NaN without issues. Unfortunately, one of the columns I'm having issues with indicates a particular (and changing per row) reference, and citing reference 1.0 sounds quite strange.
  4. Convert the columns in question to strings and just insert a blank string. Unfortunately, the format this is put in keeps track of the type of data in the column, and I think the journal would be unhappy for listing what are obviously integers as alphabetical characters. This also applies to point 3, too.

So that's where I am. I need to have blank entries in a column of ints, and the only way I'm aware of for floats disagrees with Python. The various workarounds suggested in other answers to questions of this sort are unworkable in my specific case. So how can I get this to work, either by somehow making NaN convert to an int or otherwise inserting a blank int?

Upvotes: 2

Views: 6760

Answers (2)

GSA
GSA

Reputation: 813

A little late to the party, but not sure if is this is what you are looking for, but numpy.nan_to-num should be able to do that.

Using your example, this is what you could do:

test_data = [[2.3, 4], [1.1, np.nan]]

#converts nan to int, default value (0)
np.nan_to_num(x=test_data).astype('int')
array([[2, 4],
       [1, 0]])

You could also specify a user-defined value for nan, as in the following example:

# converts nan to user-defined value (10)
np.nan_to_num(x=test_data, nan=10).astype('int')

Upvotes: 1

AJ Biffl
AJ Biffl

Reputation: 584

One thing would be to cast it as an array of type object:

test_array = np.array(test_data, dtype=object)

which will preserve the floats 2.3 and 1.1, keep nan as a float, but will cast 4 as an integer:

print(test_array)
print([type(val) for row in test_array for val in row])
> [[2.3 4]
   [1.1 nan]]
> [<class 'float'>, <class 'int'>, <class 'float'>, <class 'float'>]

If you want all the numbers cast as int, one thing you can do is cast what you can cast and leave the rest as-is:

array1 = np.array(test_data)

nan_indices = np.isnan(array1)

test_array = np.empty(array1.shape, dtype = object)
test_array[~nan_indices] = array1[~nan_indices].astype(int)
test_array[nan_indices] = np.nan

Then the printouts look like:

> [[2 4]
   [1 nan]]
> [<class 'int'>, <class 'int'>, <class 'int'>, <class 'float'>]

Upvotes: 0

Related Questions