Getting Numpy overflow despite declaring dtype=int64

Question

I'm Downloading stock prices from Yahoo for the S&P500, which has volume too big for a 32-bit integer.

def yahoo_prices(ticker, start_date=None, end_date=None, data='d'):

    csv = yahoo_historical_data(ticker, start_date, end_date, data)

    d = [('date',      np.datetime64),
         ('open',      np.float64),
         ('high',      np.float64),
         ('low',       np.float64),
         ('close',     np.float64),
         ('volume',    np.int64),
         ('adj_close', np.float64)]

    return np.recfromcsv(csv, dtype=d)

Here's the error:

>>> sp500 = yahoo_prices('^GSPC')
Traceback (most recent call last):
  File "", line 108, in 
  File "", line 74, in yahoo_prices
  File "/usr/local/lib/python2.6/dist-packages/numpy/lib/npyio.py", line 1812, in recfromcsv
    output = genfromtxt(fname, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/numpy/lib/npyio.py", line 1646, in genfromtxt
    output = np.array(data, dtype=ddtype)
OverflowError: long int too large to convert to int

Why would I still be getting this error if I declared the dtype to use int64? Is this an indication that the io function isn't really using my dtype sequence d?

===Edit ... example csv added===

Date,Open,High,Low,Close,Volume,Adj Close
2012-06-15,1329.19,1343.32,1329.19,1342.84,4401570000,1342.84
2012-06-14,1314.88,1333.68,1314.14,1329.10,3687720000,1329.10
2012-06-13,1324.02,1327.28,1310.51,1314.88,3506510000,1314.88

KobeJohn · Accepted Answer

I'm not sure, but I think you found a bug in numpy. I filed it here.

As I said there, if you open npyio.py and edit this line within recfromcsv:

kwargs.update(dtype=kwargs.get('update', None),

to this:

kwargs.update(dtype=kwargs.get('dtype', None),

Then it works for me with no problem for the long integer (I didn't check the datetime correctness as Joe wrote in his answer). You may notice that your dates weren't being converted either. Here is the specific code that works. The contents of "test.csv" are copy pasted from your example csv data.

import numpy as np
d = [('date',      np.datetime64),
    ('open',      np.float64),
    ('high',      np.float64),
    ('low',       np.float64),
    ('close',     np.float64),
    ('volume',    np.int64),
    ('adj_close', np.float64)]
a = np.recfromcsv("test.csv", dtype=d)
print(a)

[ (datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1329.19, 1343.32, 1329.19, 1342.84, 4401570000, 1342.84)
 (datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1314.88, 1333.68, 1314.14, 1329.1, 3687720000, 1329.1)
 (datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1324.02, 1327.28, 1310.51, 1314.88, 3506510000, 1314.88)]

Update: If you don't want to modify numpy, just use the relevant numpy code for recfromcsv

I've also "fixed" the datetime issue by using a native python object in the datetime field. I don't know if that will work for you.

import datetime
import numpy as np

d = [('date',     datetime.datetime),
    ('open',      np.float64),
    ('high',      np.float64),
    ('low',       np.float64),
    ('close',     np.float64),
    ('volume',    np.int64),
    ('adj_close', np.float64)]

#a = np.recfromcsv("test.csv", dtype=d)
kwargs = {"dtype": d}
case_sensitive = kwargs.get('case_sensitive', "lower") or "lower"
names = kwargs.get('names', True)
kwargs.update(
    delimiter=kwargs.get('delimiter', ",") or ",",
    names=names,
    case_sensitive=case_sensitive)
output = np.genfromtxt("test.csv", **kwargs)
output = output.view(np.recarray)

print(output)

Getting Numpy overflow despite declaring dtype=int64

Answers (2)

Update: If you don't want to modify numpy, just use the relevant numpy code for recfromcsv

Related Questions