Reputation: 4828
I'm Downloading stock prices from Yahoo for the S&P500, which has volume too big for a 32-bit integer.
def yahoo_prices(ticker, start_date=None, end_date=None, data='d'):
csv = yahoo_historical_data(ticker, start_date, end_date, data)
d = [('date', np.datetime64),
('open', np.float64),
('high', np.float64),
('low', np.float64),
('close', np.float64),
('volume', np.int64),
('adj_close', np.float64)]
return np.recfromcsv(csv, dtype=d)
Here's the error:
>>> sp500 = yahoo_prices('^GSPC')
Traceback (most recent call last):
File "<stdin>", line 108, in <module>
File "<stdin>", line 74, in yahoo_prices
File "/usr/local/lib/python2.6/dist-packages/numpy/lib/npyio.py", line 1812, in recfromcsv
output = genfromtxt(fname, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/numpy/lib/npyio.py", line 1646, in genfromtxt
output = np.array(data, dtype=ddtype)
OverflowError: long int too large to convert to int
Why would I still be getting this error if I declared the dtype to use int64? Is this an indication that the io function isn't really using my dtype sequence d
?
===Edit ... example csv added===
Date,Open,High,Low,Close,Volume,Adj Close
2012-06-15,1329.19,1343.32,1329.19,1342.84,4401570000,1342.84
2012-06-14,1314.88,1333.68,1314.14,1329.10,3687720000,1329.10
2012-06-13,1324.02,1327.28,1310.51,1314.88,3506510000,1314.88
Upvotes: 3
Views: 1219
Reputation: 7545
I'm not sure, but I think you found a bug in numpy. I filed it here.
As I said there, if you open npyio.py and edit this line within recfromcsv
:
kwargs.update(dtype=kwargs.get('update', None),
to this:
kwargs.update(dtype=kwargs.get('dtype', None),
Then it works for me with no problem for the long integer (I didn't check the datetime correctness as Joe wrote in his answer). You may notice that your dates weren't being converted either. Here is the specific code that works. The contents of "test.csv" are copy pasted from your example csv data.
import numpy as np
d = [('date', np.datetime64),
('open', np.float64),
('high', np.float64),
('low', np.float64),
('close', np.float64),
('volume', np.int64),
('adj_close', np.float64)]
a = np.recfromcsv("test.csv", dtype=d)
print(a)
[ (datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1329.19, 1343.32, 1329.19, 1342.84, 4401570000, 1342.84)
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1314.88, 1333.68, 1314.14, 1329.1, 3687720000, 1329.1)
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1324.02, 1327.28, 1310.51, 1314.88, 3506510000, 1314.88)]
I've also "fixed" the datetime issue by using a native python object in the datetime field. I don't know if that will work for you.
import datetime
import numpy as np
d = [('date', datetime.datetime),
('open', np.float64),
('high', np.float64),
('low', np.float64),
('close', np.float64),
('volume', np.int64),
('adj_close', np.float64)]
#a = np.recfromcsv("test.csv", dtype=d)
kwargs = {"dtype": d}
case_sensitive = kwargs.get('case_sensitive', "lower") or "lower"
names = kwargs.get('names', True)
kwargs.update(
delimiter=kwargs.get('delimiter', ",") or ",",
names=names,
case_sensitive=case_sensitive)
output = np.genfromtxt("test.csv", **kwargs)
output = output.view(np.recarray)
print(output)
Upvotes: 3
Reputation: 284602
You need to convert your date strings to actual dates. The formats in your dtype are being ignored because the first column can't be directly converted to a datetime.
numpy
expects you to be fairly explicit and refuses to guess date formats.
(Edit: This used to be the case, but isn't anymore.)
It expects datetime objects. See dateutil.parser
if you want to guess date/time formats from strings.
At any rate, you'll want something like the following:
from cStringIO import StringIO
import datetime as dt
import numpy as np
dat = """Date,Open,High,Low,Close,Volume,Adj Close
2012-06-15,1329.19,1343.32,1329.19,1342.84,4401570000,1342.84
2012-06-14,1314.88,1333.68,1314.14,1329.10,3687720000,1329.10
2012-06-13,1324.02,1327.28,1310.51,1314.88,3506510000,1314.88"""
infile = StringIO(dat)
d = [('date', np.datetime64),
('open', np.float64),
('high', np.float64),
('low', np.float64),
('close', np.float64),
('volume', np.int64),
('adj_close', np.float64)]
def parse_date(item):
return dt.datetime.strptime(item, '%Y-%M-%d')
data = np.recfromcsv(infile, converters={0:parse_date}, dtype=d)
However, things like this are where pandas
shines. Consider using something like the following:
from cStringIO import StringIO
import pandas
dat = """Date,Open,High,Low,Close,Volume,Adj Close
2012-06-15,1329.19,1343.32,1329.19,1342.84,4401570000,1342.84
2012-06-14,1314.88,1333.68,1314.14,1329.10,3687720000,1329.10
2012-06-13,1324.02,1327.28,1310.51,1314.88,3506510000,1314.88"""
infile = StringIO(dat)
data = pandas.read_csv(infile, index_col=0, parse_dates=True)
Upvotes: 1