David G.
David G.

Reputation: 11

Python and Numeric/numpy Array Slicing Behavior

On Python2.4, the single colon slice operator : works as expected on Numeric matrices, in that it returns all values for the dimension it was used on. For example all X and/or Y values for a 2-D matrix.

On Python2.6, the single colon slice operator seems to have a different effect in some cases: for example, on a regular 2-D MxN matrix, m[:] can result in zeros(<some shape tuple>, 'l') being returned as the resulting slice. The full matrix is what one would expect - which is what one gets using Python2.4.

Using either a double colon :: or 3 dots ... in Python2.6, instead of a single colon, seems to fix this issue and return the proper matrix slice.

After some guessing, I discovered you can get the same zeros output when inputting 0 as the stop index. e.g. m[<any index>:0] returns the same "zeros" output as m[:]. Is there any way to debug what indexes are being picked when trying to do m[:]? Or did something change between the two Python versions (2.4 to 2.6) that would affect the behavior of slicing operators?

The version of Numeric being used (24.2) is the same between both versions of Python. Why does the single colon slicing NOT work on Python 2.6 the same way it works with version 2.4?

Python2.6:

>>> a = array([[1,2,3],[4,5,6]])
**>>> a[:]
zeros((0, 3), 'l')**

>>> a[::]
array([[1,2,3],[4,5,6]])

>>> a[...]
array([[1,2,3],[4,5,6]])

Python2.4:

>>> a = array([[1,2,3],[4,5,6]])
**>>> a[:]
array([[1,2,3],[4,5,6]])**

>>> a[::]
array([[1,2,3],[4,5,6]])

>>> a[...]
array([[1,2,3],[4,5,6]])

(I typed the "code" up from scratch, so it may not be fully accurate syntax or printout-wise, but shows what's happening)

Upvotes: 0

Views: 257

Answers (2)

David G.
David G.

Reputation: 11

It seems the problem is an integer overflow issue. In the Numeric source code, the matrix data structure being used is in a file called MA.py. The specific class is called MaskedArray. There is a line at the end of the class that sets the "array()" function to this class. I had much trouble finding this information but it turned out to be very critical.

There is also a getslice(self, i, j) method in the MaskedArray class that takes in the start/stop indices and returns the proper slice. After finding this and adding debug for those indices, I discovered that under the good case with Python2.4, when doing a slice for an entire array the start/stop indices automatically input are 0 and 2^31-1, respectively. But under Python2.6, the stop index automatically input changed to be 2^63-1.

Somewhere, probably in the Numeric source/library code, there is only 32 bits to store the stop index when slicing arrays. Hence, the 2^63-1 value was overflowing (but any value greater than 2^31 would overflow). The output slice in these bad cases ends up being equivalent to slicing from start 0 to stop 0, e.g. an empty matrix. When you slice from [0:-1] you do get a valid slice. I think (2^63 - 1) interpreted as a 32 bit number would come out to -1. I'm not quite sure why the output of slicing from 0 to 2^63-1 is the same as slicing from 0 to 0 (where you get an empty matrix), and not from 0 to -1 (where you get at least some output).

Although, if I input ending slice indexes that would overflow (i.e. greater than 2^31), but the lower 32 bits were a valid positive non-zero number, I would get a valid slice back. E.g. a stop index of 2^33+1 would return the same slice as a stop index of 1, because the lower 32 bits are 1 in both cases.

Python 2.4 Example code:

>>> a = array([[1,2,3],[4,5,6]])
>>> a[:]             # (which actually becomes a[0:2^31-1])
[[1,2,3],[4,5,6]]    # correct, expect the entire array

Python 2.6 Example code:

>>> a = array([[1,2,3],[4,5,6]])
>>> a[:]             # (which actually becomes a[0:2^63-1])
zeros((0, 3), 'l')   # incorrect b/c of overflow, should be full array
>>> a[0:0]
zeros((0, 3), 'l')   # correct, b/c slicing range is null
>>> a[0:2**33+1]
[ [1,2,3]]           # incorrect b/c of overflow, should be full array
                     # although it returned some data b/c the
                     # lower 32 bits of (2^33+1) = 1
>>> a[0:-1]
[ [1,2,3]]           # correct, although I'm not sure why "a[:]" doesn't
                     # give this output as well, given that the lower 32
                     # bits of 2^63-1 equal -1

Upvotes: 1

hpaulj
hpaulj

Reputation: 231385

I think I was using 2.4 10 years ago. I used numpy back then, but may have added Numeric for its NETCDF capabilities. But the details are fuzzy. And I don't have any of those versions now for testing.

Python documentation back then should be easy to explore. numpy/Numeric documentation was skimpier.

I think Python has always had the basic : slicing for lists. alist[:] to make a copy, alist[1:-1] to slice of the first and last elements, etc.

I don't know when the step was added, e.g. alist[::-1] to reverse a list.

Python started to recognize indexing tuples at the request of numeric developers, e.g. arr[2,4], arr[(2,4)], arr[:, [1,2]], arr[::-1, :]. But I don't know when that appeared

Ellipsis is also mainly of value for multidimensional indexing. The Python interpreter recognizes ..., but lists don't handle it. About the same time the : notation was formally implemented as slice, e.g.

In 3.5, we can reverse a list with a slice

In [6]: list(range(10)).__getitem__(slice(None,None,-1))
Out[6]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

I would suggest a couple of things:

  • make sure you understand numpy (and list) indexing/slicing in a current system

  • try the same things in the older versions; ask SO questions with concrete examples of the differences. Don't count on any of us to have memories of the old code.

  • study the documentation to find when suspected features where changed or added.

Upvotes: 0

Related Questions