Reputation: 6152
I have a NumPy array that is constructed from a text file. I've been doing things this way for weeks and never seen this problem before.
print data
print data[:, 1:]
outputs
[['1', '200', '300', '400', '500\n']
['3', '500', '400', '200', '1000\n']
['14', '900', '200', '300', '100\n'] ...,
['999142', '24', '21', '20', '12\n']]
Traceback (most recent call last):
File ...., line ..., in ....
print data[:, 1:]
IndexError: too many indices
Why is this happening and how can I fix it?
Edit: Big clue. data.shape
is (3313869,)
with no second value.
data.ndim
is 1
.
len(data[1])
, however, is 5.
Edit, I am constructing it with
data = [re.split(' ', line) for line in f]
f.close()
data = np.array(data)
When I interject
f.close()
print data[0:10]
It gives i.e.
[['1', '200', '300', '400', '500\n'], ['3', .... ]]
Upvotes: 1
Views: 801
Reputation: 59005
The problem happened because your code is somehow creating a numpy.array
of objects. See this question with a similar issue. When it happens you get something like:
a = numpyp.array([list1, list2, list3, ... , listn], dtype=object)
It is a 1D array, but when you ask to print it will call the __str__
of each list inside, giving:
[[ 1, 2, 3, 4],
[ 5, 6, 7, 8]]
which seems like a 2D array.
You can simulate it doing:
a = ['aaa' for i in range(10)]
b = numpy.empty((5),dtype=object)
b.fill(a)
lets check b
:
b.shape # (5,)
b.ndim # 1
but print b
gives:
[['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']]
Quite tricky...
Upvotes: 1
Reputation: 6152
I solved this with
for line in data:
if (len(line) != 5):
print len(line)
print line
A few of the lines in my data had spaces at the end, which was leading to 500
and \n
being separated into separate tokens. This snuck in because on Friday, the last time I messed with this code, I had added in a default option to the Python script that builds the input files for this script for rows that were missing a particular value, and Vim put in a space token on the line-wrap, which just happened to be on the character right before \n
.
[re.split(' ', line.replace('\n', '').rstrip()) for line in f]
gives the desires result.
It is a little strange, I think, that NumPy treats the array as both 1-D and 2-D (allowing me to select data[1]
as a row) but I guess if the rows aren't of consistent length it just sees it as an array of arrays rather than a 2-D array, making a distinction between the two.
Upvotes: 0