Reputation: 58
I'm using a 3 dimensional array, that is defined like this:
x = np.zeros((dim1, dim2, dim3), dtype=np.float32)
After inserting some data I need to apply a function only if values in specific columns are still zero. The columns I'm interested in are selected by this array containing the correct indexes
scale_idx = np.array([0,1,3])
therefore what I'm trying to do is to use indexing to select those row and columns.
At first i tried to do this, using a boolean mask for the first 2 dimensions, using an array for the third:
x[x[:,:,scale_idx].any(axis =2)] ,scale_idx]
but I get this error:
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,) (3,)
If I change the last index to :
I get all the row I'm interested in, but i get all the possible columns, I was expecting that the last array would act as an indexer, as explained in https://docs.scipy.org/doc/numpy/user/basics.indexing.html.
x[x[:,:,scale_idx].any(axis =2)]
My scale_idx
should be interpreted as a column indexers but are actually interpreted as row indexes, therefore, since only 2 rows respect the condition but i have 3 indexes, I get an IndexError
.
I have found a workaround to this using
x[x[:,:,scale_idx].any(axis =2)][:,:,scale_idx]
but it's kinda ugly and, since it's a slice, i can't modify the original array.
Anybody willing to explain to me what I'm doing wrong?
EDIT: Thanks to @hpaulj I've managed to isolate the cells I need, after that I've created a matrix with the same shape of the selected values, and assigned the values to the masked cells, to my surprise, the new values are not the ones I just set but are some random integers that I can't figure out where they came from. Code to reproduce:
scale_idx = np.array([0,3,1])
b = x[:,:,scale_idx].any(axis =2)
I, J = np.nonzero(b)
x[I[:,None], J[:,None], scale_idx] #this selects the correct cells
>>>
array([[ 50, 50, 50],
[100, 100, 100],
[100, 100, 100]])
scaler.transform(x[I[:,None], J[:,None], scale_idx]) #sklearn standard scaler, returns a matrix with the scaled values
>>>
array([[-0.50600345, -0.5445559 , -1.2957878 ],
[-0.50600345, -0.25915199, -1.22266904],
[-0.50600345, -0.25915199, -1.22266904]])
x[I[:,None], J[:,None], scale_idx] = scaler.transform(x[I[:,None], J[:,None], scale_idx]) #assign the new values to the selected cells
x[I[:,None], J[:,None], scale_idx] #check the new values
array([[0, 2, 0],
[0, 6, 2],
[0, 6, 2]])
Why are the new values different from what I'm expecting?
Upvotes: 1
Views: 912
Reputation: 231385
Let's take the 3d boolean mask example from the indexing
docs:
In [135]: x = np.arange(30).reshape(2,3,5)
...: b = np.array([[True, True, False], [False, True, True]])
In [136]: x
Out[136]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
In [137]: b
Out[137]:
array([[ True, True, False],
[False, True, True]])
In [138]: x[b]
Out[138]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
This is a 2d array. The mask b
selects elements from the first 2 dimensions. The False
values cause it to skip the [10...] and [15...] rows.
We can slice on the last dimension:
In [139]: x[b,:3]
Out[139]:
array([[ 0, 1, 2],
[ 5, 6, 7],
[20, 21, 22],
[25, 26, 27]])
but a list index will produce an error (unless it's length 4):
In [140]: x[b,[0,1,2]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-140-7f1dbec100f2> in <module>
----> 1 x[b,[0,1,2]]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (4,) (3,)
The reason is that the boolean mask effectively translates into index with the np.where
arrays:
In [141]: np.nonzero(b)
Out[141]: (array([0, 0, 1, 1]), array([0, 1, 1, 2]))
nonzero
found 4 nonzero elements. The x[b]
indexing is then:
In [143]: x[[0,0,1,1],[0,1,1,2],:]
Out[143]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean-array-indexing
The shape mismatch then becomes more obvious:
In [144]: x[[0,0,1,1],[0,1,1,2],[1,2,3]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-144-1efd76049cb0> in <module>
----> 1 x[[0,0,1,1],[0,1,1,2],[1,2,3]]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (4,) (3,)
If the lists match in size, the indexing runs, but produces a 'diagonal', not a block:
In [145]: x[[0,0,1,1],[0,1,1,2],[1,2,3,4]]
Out[145]: array([ 1, 7, 23, 29])
As you found the two stage indexing works - but not for setting values
In [146]: x[[0,0,1,1],[0,1,1,2]][:,[1,2,3]]
Out[146]:
array([[ 1, 2, 3],
[ 6, 7, 8],
[21, 22, 23],
[26, 27, 28]])
We can get the block by 'transposing' the last index list:
In [147]: x[[0,0,1,1],[0,1,1,2],[[1],[2],[3]]]
Out[147]:
array([[ 1, 6, 21, 26],
[ 2, 7, 22, 27],
[ 3, 8, 23, 28]])
Ok, this is the transpose. We could apply transpose to it. Or we could transpose the b
arrays first:
In [148]: I,J=np.nonzero(b)
In [149]: x[I[:,None], J[:,None], [1,2,3]]
Out[149]:
array([[ 1, 2, 3],
[ 6, 7, 8],
[21, 22, 23],
[26, 27, 28]])
And this works for setting
In [150]: x[I[:,None], J[:,None], [1,2,3]]=0
In [151]: x
Out[151]:
array([[[ 0, 0, 0, 0, 4],
[ 5, 0, 0, 0, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 0, 0, 0, 24],
[25, 0, 0, 0, 29]]])
It's a long answer. I had a general idea of what was happening, but needed to work out the details. Plus, you need to understand what's going on.
Upvotes: 4