Cocoabuff
Cocoabuff

Reputation: 21

Update submatrix with R-like or MATLAB-like syntax in NumPy and Python

I was a R user and I am learning Python (numpy in particular), but I cannot perform a simple task of updating a submatrix in Python which can be very easily done in R.

So I have 2 problems.

First one is say we have a 4 by 4 matrix

A = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])

and a 2 by 2 matrix

B = np.array([[100,200],[300,400]]).

I want to grab a 2 by 2 submatrix of A formed by taking 2nd and 4th rows and columns (array([[6,8][14,16]])) and replace it to B.

I can pull the correct matrix by doing

m = [1,3]
A[m][:,m]

but nothing happens to A even after I update it to B. That is

A[m][:,m] = B
print A

and A comes out to be the same.

Is there a way to do this without using loops or maybe with a relatively simple code?

The second problem which is relatively easy is that in R, we can subset matrix with True and False. From above A, we can subset the same 2 by 2 matrix by

m = [F, T, F, T]
A[m,m]

However, in Python the same code does not seem to work because True is 1 and False is 0. I think I can convert [F,T,F,T] to [1,3] and subset, but I thought there may be a one step method to do this.

Is there a easy way to do the same operation in Python when the index is given in terms of True and False?

Upvotes: 2

Views: 582

Answers (1)

ely
ely

Reputation: 77464

For part 1, from NumPy for MATLAB Users, there are examples showing both read-only and mutable access to arbitrary slices.

The read-only pattern is similar to what you already describe, A[:, m][m]. This slices the columns first, then the rows, and provides a read-only view of the returned data.

To obtain clean indices for mutating the sub-array, a convenience function is provided, np.ix_. It will stitch together its arguments into an R-like or MATLAB-like slice:

indxs = np.ix_([1,3], [1,3])
A[indxs] = B

The reason behind this is that NumPy follows certain shape-conformability rules (called "broadcasting" rules) about how to infer the shapes you intended based on the shapes present in the data. When NumPy does this for a row index and column index pair, it tries to pair them up element-wise.

So A[[1, 3], [1, 3]] under NumPy's chosen conventions, is interpreted as "Fetch for me the value of A at index (1,1) and at index (3,3)." Which is different than the conventions for this same syntax in MATLAB, Octave, or R.

If you want to get around this manually, without np.ix_, you still can, but you must write down your indices to take advantage of NumPy's broadcasting rules. What this means is you have to give NumPy a reason to believe that you want a 2x2 grid of indices instead of a 1x2 list of two specific points.

You can trick it into believing this by making your row entries into lists themselves: rows = [[1], [3]]. Now when NumPy examines the shape of this (1 x 2 instead of 1 x nothing) it will say, 'aha, the columns had better also be 1 x 2' and automatically promote the list of columns to match individually with each possible row. That's why this also will work:

A[[[1], [3]], [1, 3]] = B

For the second part of your question, the issue is that you want to let NumPy know that your array of [False, True, False, True] is a boolean array and should not be implicitly cast as any other type of array.

This can be done in many ways, but one easy way is to construct an np.array of your boolean values, and its dtype will be bool:

indxs = np.array([False, True, False, True])
print A[:, indxs][indxs] # remember, this one is read only

A[np.ix_(indxs, indxs)] = B

Another helpful NumPy convenience tool is np.s_, which is not a function (it is an instance of numpy.lib.index_tricks.IndexExpression) but can be used kind of like one.

np.s_ allows you to use the element-getting syntax (called getitem syntax in Python, after the __getitem__ method that any new-style class instances will have). By way of example:

In [60]: np.s_[[1,3], [1,3]]
Out[60]: ([1, 3], [1, 3])

In [61]: np.s_[np.ix_([1,3], [1,3])]
Out[61]: 
(array([[1],
       [3]]), array([[1, 3]]))

In [62]: np.s_[:, [1,3]]
Out[62]: (slice(None, None, None), [1, 3])

In [63]: np.s_[:, :]
Out[63]: (slice(None, None, None), slice(None, None, None))

In [64]: np.s_[-1:1:-2, :]
Out[64]: (slice(-1, 1, -2), slice(None, None, None))

So np.s_ basically just mirrors back to what the slice index object will look like if you were to place it inside the square brackets in order to access some array's data.

In particular, the first two of these np.s_ examples shows you the difference between plain A[[1,3], [1,3]] and the use of np.ix_([1,3], [1,3]) and how they result in different slices.

Upvotes: 1

Related Questions