NumPy thinks a 2-D array is 1-D

Question

I have a NumPy array that is constructed from a text file. I've been doing things this way for weeks and never seen this problem before.

print data
print data[:, 1:]

outputs

[['1', '200', '300', '400', '500
']
 ['3', '500', '400', '200', '1000
']
 ['14', '900', '200', '300', '100
'] ...,
 ['999142', '24', '21', '20', '12
']]
Traceback (most recent call last):
File ...., line ..., in ....
print data[:, 1:]
IndexError:  too many indices

Why is this happening and how can I fix it?

Edit: Big clue. data.shape is (3313869,) with no second value.

data.ndim is 1.

len(data[1]), however, is 5.

Edit, I am constructing it with

data = [re.split(' ', line) for line in f]
f.close()
data = np.array(data)

When I interject

f.close()
print data[0:10]

It gives i.e.

[['1', '200', '300', '400', '500 '], ['3', .... ]]

Saullo G. P. Castro · Accepted Answer

The problem happened because your code is somehow creating a numpy.array of objects. See this question with a similar issue. When it happens you get something like:

a = numpyp.array([list1, list2, list3, ... , listn], dtype=object)

It is a 1D array, but when you ask to print it will call the __str__ of each list inside, giving:

[[ 1, 2, 3, 4],
 [ 5, 6, 7, 8]]

which seems like a 2D array.

You can simulate it doing:

a = ['aaa' for i in range(10)]
b = numpy.empty((5),dtype=object)
b.fill(a)

lets check b:

b.shape # (5,)
b.ndim  # 1

but print b gives:

[['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
 ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
 ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
 ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
 ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']]

Quite tricky...

NumPy thinks a 2-D array is 1-D

Answers (2)

Related Questions