Anirban Chakraborty
Anirban Chakraborty

Reputation: 781

Pandas dataframe with int8 column showing inconsistent arithmetic (sum and product)

I have a dataframe with an int8 column to ensure lower memory.

In [1]: df = pd.DataFrame({'a': [100, 50]}, dtype='int8')
        df
Out[1]:
     a
0   100
1   50

In [2]: df.dtypes
Out[2]: a    int8
        dtype: object

sum automatically promotes the result to int64 and gives the correct result.

In [3]: df.sum()
Out[3]:
a    150
dtype: int64

But a + or * operation does not do so.

In [4]: df.loc[0, 'a'] + df.loc[1, 'a']
C:\Users\bubai\AppData\Local\Temp\ipykernel_33164\1219674856.py:1: RuntimeWarning: overflow encountered in byte_scalars
  df.loc[0, 'a'] + df.loc[1, 'a']
Out[4]: -106
In [5]: df['a'] * 4
Out[5]: 0   -112
        1    -56
        Name: a, dtype: int8

So at one place pandas decides to automatically upcast the result whereas in other cases it does not. Is this an inconsistency in pandas or non-standard coding on my end? If I have such arithmetic operations in my code, how can I avoid the incorrect results?

Upvotes: 0

Views: 395

Answers (1)

mozway
mozway

Reputation: 262284

numpy is doing that as well:

np.array([100, 50], dtype=np.int8).sum()

Output: 150

If you must have an int8, perform an explicit conversion:

df.sum().astype(np.int8)

output:

a   -106
dtype: int8

Upvotes: 1

Related Questions