MainTankJerry
MainTankJerry

Reputation: 23

Replacing integers with NaN results in the entire column becoming float dtype

First, I did

a = [[6,5,4,3,2],[1,2,3,4,5,6],[3,4,5,6]]
b = pd.DataFrame(a)
print(b.head(2))

The output is

1   2   3   4   5      6
6   5   4   3   2.00   NaN
1   2   3   4   5.00   6.00
3   4   5   6   NaN    NaN

So I did

a = [[6,5,4,3,2],[1,2,3,4,5,6],[3,4,5,6]]
b = pd.DataFrame(a).fillna(-1).astype(int)
print(b.head(2))   

The output becomes

1   2   3   4   5   6
6   5   4   3   2   -1
1   2   3   4   5   6
3   4   5   6   -1  -1

But I don't want those -1, so I did

a = [[6,5,4,3,2],[1,2,3,4,5,6],[3,4,5,6]]
b = pd.DataFrame(a).fillna(-1).astype(int)
b = b.replace(-1, np.NaN)
print(b.head(2))

The output is again same as the first time

1   2   3   4   5      6
6   5   4   3   2.00   NaN
1   2   3   4   5.00   6.00
3   4   5   6   NaN    NaN

Upvotes: 2

Views: 1558

Answers (1)

cs95
cs95

Reputation: 402423

Because of this:

type(np.nan)
# float

If you have NaNs in your column, the rest of your column is automatically upcasted to float for efficient computation.

pandas 0.24+

We can use the Nullable Integer Type which allow integers to coexist with NaNs:

b = b.astype('Int32')
b

   0  1  2  3    4    5
0  6  5  4  3    2  NaN
1  1  2  3  4    5    6
2  3  4  5  6  NaN  NaN

b.dtypes

0    Int32
1    Int32
2    Int32
3    Int32
4    Int32
5    Int32
dtype: object

<= 0.23

To get around that, convert the dtype to object, which I don't recommend unless it's only for display purposes (you kill efficiency this way).

u = df.select_dtypes(float)
b[u.columns] = u.astype(object)
b

   0  1  2  3    4    5
0  6  5  4  3    2  NaN
1  1  2  3  4    5    6
2  3  4  5  6  NaN  NaN

print(b.dtypes)
0     int64
1     int64
2     int64
3     int64
4    object
5    object
dtype: object

Upvotes: 2

Related Questions