Reputation: 3125
I have a numpy object with the following format:
date,column1,column2,column3,column4,,column5,,column6,,column7,,column8,,column9,,column10
date,column1,column2,column3,column4,,column5,,column6,,column7,,column8,,column9,,column10
date,column1,column2,column3,column4,,column5,,column6,,column7,,column8,,column9,,column10
...
I am attempting to retrieve only rows that meet a certain date condition such as all rows where the date is greater than 2005 as follows (myData is a numpy object):
li = (myData[:,0] > myData[2][0].year)
however i keep getting the following error:
too many indices for array,
the shape is (128,) dtype is [('Date', 'O'), ('SF1.AAPL_DEBT_MRQ - Value', '
can someone please advise, thanks in advance!
Upvotes: 0
Views: 6751
Reputation: 231665
This looks like a structured array, most likely created by reading data from a csv
(with np.genfromtxt
). If so it probably is 1 dimensional with a complex dtype
. Assuming the first field is called 'Date', then you can get an array of all the dates with
myData['Date']
The data for the 1st row will be
myData[0]
The 1st date will either be myData[0]['Date']
or myData['Date'][0]
.
As I guessed, this is a 1d structured array:
shape = (128,)
type = [('Date', 'O'), ('SF1.AAPL_DEBT_MRQ - Value', '<f8'), ....)]
The 'O'
indictates the date is an object, which could be anything, so I can't say anything about its format or content.
Try:
li = [date.year>2005 for date in myData['Date']
This should at least get the indexing right. I'm guessing that each 'date' has a 'year' attribute that can be compared to '2005'. This should give a list 128 long of True/False.
Try using:
myData[li]
to get just the rows that meet your criteria. You may have to convert li
to an array, or list of index numbers. But regardless, myData
will always be indexed with one value or list. The too many indices
error means you are treating it like a 2d array when it is actually just 1d.
gboffi's data can be read, with field names, as
data = np.genfromtxt('puff.csv', dtype=None, delimiter=',', names=True)
with the resulting dtype
dtype([('date', 'S10'), ('pippo', '<f8'), ('pluto', '<f8'), ('paperino', '<f8')])
The desired rows can be found with:
I=[x[:4]<'2014' for x in data['date']]
# the 'date' field can be selected before or after element selection
# [True, True, True, False]
data[np.array(I)]
numpy
has a datetime64
type that can be used for comparisons:
dates=[np.datetime64(x) for x in data['date']]
I = dates<np.datetime64('2014-01-01')
# array([ True, True, True, False], dtype=bool)
data[I]
If the date format is correct, genfromtxt
can do the string to date conversion:
In [206]: data = np.genfromtxt(txt, dtype=('datetime64[D]','f8','f8','f8'), delimiter=',', names=True)
In [207]: data
Out[207]:
array([(datetime.date(2012, 10, 20), 3.0, 5.0, 6.0),
(datetime.date(2013, 5, 22), 4.0, 6.0, 2.0),
(datetime.date(2013, 7, 31), 5.0, 1.0, 6.0),
(datetime.date(2014, 10, 8), 0.0, 3.0, 4.0)],
dtype=[('date', '<M8[D]'), ('pippo', '<f8'), ('pluto', '<f8'), ('paperino', '<f8')])
And the year selection can be done with:
In [208]: data[data['date']<np.datetime64('2014','Y')]
Out[208]:
array([(datetime.date(2012, 10, 20), 3.0, 5.0, 6.0),
(datetime.date(2013, 5, 22), 4.0, 6.0, 2.0),
(datetime.date(2013, 7, 31), 5.0, 1.0, 6.0)],
dtype=[('date', '<M8[D]'), ('pippo', '<f8'), ('pluto', '<f8'), ('paperino', '<f8')])
Or even a date selection:
In [209]: data[data['date']<np.datetime64('2013-06-01','D')]
Out[209]:
array([(datetime.date(2012, 10, 20), 3.0, 5.0, 6.0),
(datetime.date(2013, 5, 22), 4.0, 6.0, 2.0)],
dtype=[('date', '<M8[D]'), ('pippo', '<f8'), ('pluto', '<f8'), ('paperino', '<f8')])
Upvotes: 1
Reputation: 25093
This was built upon the answer of @hpaulj, the missing step I've added is converting the list of booleans to a ndarray
% cat puff.csv
date,pippo,pluto,paperino
2012-10-20,3.,5.,6.
2013-05-22,4.,6.,2.
2013-07-31,5.,1.,6.
2014-10-08,0.,3.,4.
% ipython
Python 2.7.8 (default, Oct 18 2014, 12:50:18)
Type "copyright", "credits" or "license" for more information.
IPython 2.3.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import numpy as np
In [2]: l = np.genfromtxt('puff.csv', dtype=None, delimiter=',', skip_header=1)
In [3]: print l
[('2012-10-20', 3.0, 5.0, 6.0) ('2013-05-22', 4.0, 6.0, 2.0)
('2013-07-31', 5.0, 1.0, 6.0) ('2014-10-08', 0.0, 3.0, 4.0)]
In [4]: l[np.array([x[0][:4]<'2014' for x in l])]
Out[4]:
array([('2012-10-20', 3.0, 5.0, 6.0), ('2013-05-22', 4.0, 6.0, 2.0),
('2013-07-31', 5.0, 1.0, 6.0)],
dtype=[('f0', 'S10'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
In [5]: print l[np.array([x[0][:4]<'2014' for x in l])]
[('2012-10-20', 3.0, 5.0, 6.0) ('2013-05-22', 4.0, 6.0, 2.0)
('2013-07-31', 5.0, 1.0, 6.0)]
In [6]:
Upvotes: 1