Reputation: 23
First, I did
a = [[6,5,4,3,2],[1,2,3,4,5,6],[3,4,5,6]]
b = pd.DataFrame(a)
print(b.head(2))
The output is
1 2 3 4 5 6
6 5 4 3 2.00 NaN
1 2 3 4 5.00 6.00
3 4 5 6 NaN NaN
So I did
a = [[6,5,4,3,2],[1,2,3,4,5,6],[3,4,5,6]]
b = pd.DataFrame(a).fillna(-1).astype(int)
print(b.head(2))
The output becomes
1 2 3 4 5 6
6 5 4 3 2 -1
1 2 3 4 5 6
3 4 5 6 -1 -1
But I don't want those -1, so I did
a = [[6,5,4,3,2],[1,2,3,4,5,6],[3,4,5,6]]
b = pd.DataFrame(a).fillna(-1).astype(int)
b = b.replace(-1, np.NaN)
print(b.head(2))
The output is again same as the first time
1 2 3 4 5 6
6 5 4 3 2.00 NaN
1 2 3 4 5.00 6.00
3 4 5 6 NaN NaN
Upvotes: 2
Views: 1558
Reputation: 402423
Because of this:
type(np.nan)
# float
If you have NaN
s in your column, the rest of your column is automatically upcasted to float
for efficient computation.
We can use the Nullable Integer Type which allow integers to coexist with NaNs:
b = b.astype('Int32')
b
0 1 2 3 4 5
0 6 5 4 3 2 NaN
1 1 2 3 4 5 6
2 3 4 5 6 NaN NaN
b.dtypes
0 Int32
1 Int32
2 Int32
3 Int32
4 Int32
5 Int32
dtype: object
To get around that, convert the dtype
to object
, which I don't recommend unless it's only for display purposes (you kill efficiency this way).
u = df.select_dtypes(float)
b[u.columns] = u.astype(object)
b
0 1 2 3 4 5
0 6 5 4 3 2 NaN
1 1 2 3 4 5 6
2 3 4 5 6 NaN NaN
print(b.dtypes)
0 int64
1 int64
2 int64
3 int64
4 object
5 object
dtype: object
Upvotes: 2