godzilla
godzilla

Reputation: 3125

numpy too many indices for array error

I have a numpy object with the following format:

date,column1,column2,column3,column4,,column5,,column6,,column7,,column8,,column9,,column10
date,column1,column2,column3,column4,,column5,,column6,,column7,,column8,,column9,,column10
date,column1,column2,column3,column4,,column5,,column6,,column7,,column8,,column9,,column10
...

I am attempting to retrieve only rows that meet a certain date condition such as all rows where the date is greater than 2005 as follows (myData is a numpy object):

li = (myData[:,0] >  myData[2][0].year)

however i keep getting the following error:

too many indices for array,

the shape is (128,) dtype is [('Date', 'O'), ('SF1.AAPL_DEBT_MRQ - Value', '

can someone please advise, thanks in advance!

Upvotes: 0

Views: 6751

Answers (2)

hpaulj
hpaulj

Reputation: 231665

This looks like a structured array, most likely created by reading data from a csv (with np.genfromtxt). If so it probably is 1 dimensional with a complex dtype. Assuming the first field is called 'Date', then you can get an array of all the dates with

myData['Date']

The data for the 1st row will be

myData[0]

The 1st date will either be myData[0]['Date'] or myData['Date'][0].


As I guessed, this is a 1d structured array:

shape = (128,)
type = [('Date', 'O'), ('SF1.AAPL_DEBT_MRQ - Value', '<f8'), ....)] 

The 'O' indictates the date is an object, which could be anything, so I can't say anything about its format or content.

Try:

li = [date.year>2005 for date in myData['Date']

This should at least get the indexing right. I'm guessing that each 'date' has a 'year' attribute that can be compared to '2005'. This should give a list 128 long of True/False.

Try using:

myData[li]

to get just the rows that meet your criteria. You may have to convert li to an array, or list of index numbers. But regardless, myData will always be indexed with one value or list. The too many indices error means you are treating it like a 2d array when it is actually just 1d.


gboffi's data can be read, with field names, as

data = np.genfromtxt('puff.csv', dtype=None,  delimiter=',', names=True)

with the resulting dtype

dtype([('date', 'S10'), ('pippo', '<f8'), ('pluto', '<f8'), ('paperino', '<f8')])

The desired rows can be found with:

I=[x[:4]<'2014' for x in data['date']]
# the 'date' field can be selected before or after element selection
# [True, True, True, False]
data[np.array(I)]

numpy has a datetime64 type that can be used for comparisons:

dates=[np.datetime64(x) for x in data['date']]
I = dates<np.datetime64('2014-01-01')
# array([ True,  True,  True, False], dtype=bool)
data[I]

If the date format is correct, genfromtxt can do the string to date conversion:

In [206]: data = np.genfromtxt(txt, dtype=('datetime64[D]','f8','f8','f8'),  delimiter=',', names=True)
In [207]: data
Out[207]: 
array([(datetime.date(2012, 10, 20), 3.0, 5.0, 6.0),
       (datetime.date(2013, 5, 22), 4.0, 6.0, 2.0),
       (datetime.date(2013, 7, 31), 5.0, 1.0, 6.0),
       (datetime.date(2014, 10, 8), 0.0, 3.0, 4.0)], 
      dtype=[('date', '<M8[D]'), ('pippo', '<f8'), ('pluto', '<f8'), ('paperino', '<f8')])

And the year selection can be done with:

In [208]: data[data['date']<np.datetime64('2014','Y')]
Out[208]: 
array([(datetime.date(2012, 10, 20), 3.0, 5.0, 6.0),
       (datetime.date(2013, 5, 22), 4.0, 6.0, 2.0),
       (datetime.date(2013, 7, 31), 5.0, 1.0, 6.0)], 
      dtype=[('date', '<M8[D]'), ('pippo', '<f8'), ('pluto', '<f8'), ('paperino', '<f8')])

Or even a date selection:

In [209]: data[data['date']<np.datetime64('2013-06-01','D')]
Out[209]: 
array([(datetime.date(2012, 10, 20), 3.0, 5.0, 6.0),
       (datetime.date(2013, 5, 22), 4.0, 6.0, 2.0)], 
      dtype=[('date', '<M8[D]'), ('pippo', '<f8'), ('pluto', '<f8'), ('paperino', '<f8')])

Upvotes: 1

gboffi
gboffi

Reputation: 25093

This was built upon the answer of @hpaulj, the missing step I've added is converting the list of booleans to a ndarray

% cat puff.csv
date,pippo,pluto,paperino
2012-10-20,3.,5.,6.
2013-05-22,4.,6.,2.
2013-07-31,5.,1.,6.
2014-10-08,0.,3.,4.
% ipython
Python 2.7.8 (default, Oct 18 2014, 12:50:18) 
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: l = np.genfromtxt('puff.csv', dtype=None,  delimiter=',', skip_header=1)

In [3]: print l
[('2012-10-20', 3.0, 5.0, 6.0) ('2013-05-22', 4.0, 6.0, 2.0)
 ('2013-07-31', 5.0, 1.0, 6.0) ('2014-10-08', 0.0, 3.0, 4.0)]

In [4]: l[np.array([x[0][:4]<'2014' for x in l])]
Out[4]: 
array([('2012-10-20', 3.0, 5.0, 6.0), ('2013-05-22', 4.0, 6.0, 2.0),
       ('2013-07-31', 5.0, 1.0, 6.0)], 
      dtype=[('f0', 'S10'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])

In [5]: print l[np.array([x[0][:4]<'2014' for x in l])]
[('2012-10-20', 3.0, 5.0, 6.0) ('2013-05-22', 4.0, 6.0, 2.0)
 ('2013-07-31', 5.0, 1.0, 6.0)]

In [6]: 

Upvotes: 1

Related Questions