Reputation: 88
In my dataset there are two columns which I want to sum to find their proportion in the calculated sum. For that I did this:
complaints_by_state['total_complaints'] = complaints_by_state.Open + complaints_by_state.Closed
It works fine except for one particular row. If you look at Georgia the sum of 208 and 80 is being calculated as 32:
Closed Open percentage_unresolved total_complaints
State
Alabama 17 9 0.346154 26
Arizona 14 6 0.300000 20
Arkansas 6 0 0.000000 6
California 159 61 0.277273 220
Colorado 58 22 0.275000 80
Connecticut 9 3 0.250000 12
Delaware 8 4 0.333333 12
Columbia 14 2 0.125000 16
Florida 201 39 0.162500 240
Georgia 208 80 2.500000 32
Illinois 135 29 0.176829 164
What's happening here and how can it be resolved?
Upvotes: 1
Views: 35
Reputation: 17730
It seems you are using 8 bit integers
import numpy as np
np.uint8(208) + np.uint8(80)
# returns
<stdin>:1: RuntimeWarning: overflow encountered in ubyte_scalars
32
use complaints_by_state.dtypes
to check. In case use
complaints_by_state['Open'] = complaints_by_state['Open'].astype(np.int)
and similar.
It is better if you fix that when you import the data.
Upvotes: 1