Reputation: 781
I have a dataframe with an int8
column to ensure lower memory.
In [1]: df = pd.DataFrame({'a': [100, 50]}, dtype='int8')
df
Out[1]:
a
0 100
1 50
In [2]: df.dtypes
Out[2]: a int8
dtype: object
sum
automatically promotes the result to int64
and gives the correct result.
In [3]: df.sum()
Out[3]:
a 150
dtype: int64
But a +
or *
operation does not do so.
In [4]: df.loc[0, 'a'] + df.loc[1, 'a']
C:\Users\bubai\AppData\Local\Temp\ipykernel_33164\1219674856.py:1: RuntimeWarning: overflow encountered in byte_scalars
df.loc[0, 'a'] + df.loc[1, 'a']
Out[4]: -106
In [5]: df['a'] * 4
Out[5]: 0 -112
1 -56
Name: a, dtype: int8
So at one place pandas
decides to automatically upcast the result whereas in other cases it does not. Is this an inconsistency in pandas
or non-standard coding on my end? If I have such arithmetic operations in my code, how can I avoid the incorrect results?
Upvotes: 0
Views: 395
Reputation: 262284
numpy is doing that as well:
np.array([100, 50], dtype=np.int8).sum()
Output: 150
If you must have an int8, perform an explicit conversion:
df.sum().astype(np.int8)
output:
a -106
dtype: int8
Upvotes: 1