Stef1611
Stef1611

Reputation: 2399

Problem in Pandas : impossible to do sum of int with arbitrary precision

I tried to do the sum of large integers in pandas and the answer is not as expected.

Input file : my_file_lg_int

my_int
111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222

Python code

file = 'my_file_lg_int'
data = pd.read_csv(file)
data['my_int'].sum()

The output is :

111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222

As integers are too long, they are not integers but strings. So I tried data = pd.read_csv(file,dtype = {'my_int': int}) but I have an overflow error. How could I solve it ?

Upvotes: 0

Views: 93

Answers (3)

medium-dimensional
medium-dimensional

Reputation: 2253

We can use decimal module to solve this. According to the documentation:

Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem:

Since the number in any given row here has 102 digits, we can choose to set the precision to 103 digits. This method will not, however, work if a number in any row has more than 103 digits.

import pandas as pd 
import decimal
from decimal import Decimal

decimal.setcontext(decimal.Context(prec=103))

df = pd.read_csv(file, dtype={"my_int": Decimal})
x = Decimal("0")

for i in df['my_int']:
    x = x + Decimal(i)

print(x)
print(type(x))

This gives:

333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 
<class 'decimal.Decimal'>

Upvotes: 1

Tim Roberts
Tim Roberts

Reputation: 54767

Many tasks are easier without hauling in the enormous pandas and numpy modules.

filename = 'my_file_lg_int'
mysum = sum( int(k.rstrip()) for k in open(filename) )

Upvotes: 2

Chrysophylaxs
Chrysophylaxs

Reputation: 6583

Perhaps:

df["my_int"].apply(int).sum()

Upvotes: 2

Related Questions