Shubham Agrawal
Shubham Agrawal

Reputation: 88

Summing dataframe columns gives unexpected output

In my dataset there are two columns which I want to sum to find their proportion in the calculated sum. For that I did this:

complaints_by_state['total_complaints'] = complaints_by_state.Open + complaints_by_state.Closed

It works fine except for one particular row. If you look at Georgia the sum of 208 and 80 is being calculated as 32:

         Closed   Open    percentage_unresolved   total_complaints
State               
Alabama  17  9   0.346154   26
Arizona  14  6   0.300000   20
Arkansas    6   0   0.000000    6
California  159 61  0.277273    220
Colorado    58  22  0.275000    80
Connecticut 9   3   0.250000    12
Delaware    8   4   0.333333    12
Columbia    14  2   0.125000    16
Florida 201 39  0.162500    240
Georgia 208 80  2.500000    32
Illinois    135 29  0.176829    164

What's happening here and how can it be resolved?

Upvotes: 1

Views: 35

Answers (1)

Ruggero Turra
Ruggero Turra

Reputation: 17730

It seems you are using 8 bit integers

import numpy as np
np.uint8(208) + np.uint8(80)

# returns
<stdin>:1: RuntimeWarning: overflow encountered in ubyte_scalars
32

use complaints_by_state.dtypes to check. In case use

complaints_by_state['Open'] = complaints_by_state['Open'].astype(np.int)

and similar.

It is better if you fix that when you import the data.

Upvotes: 1

Related Questions