user308883
user308883

Reputation:

how to exclude elements from numpy matrix

Suppose we have a matrix:

mat = np.random.randn(5,5)

array([[-1.3979852 , -0.37711369, -1.99509723, -0.6151796 , -0.78780951],
       [ 0.12491113,  0.90526669, -0.18217331,  1.1252506 , -0.31782889],
       [-3.5933008 , -0.17981343,  0.91469733, -0.59719805,  0.12728085],
       [ 0.6906646 ,  0.2316733 , -0.2804641 ,  1.39864598, -0.09113139],
       [-0.38012856, -1.7230821 , -0.5779237 ,  0.30610451, -1.30015299]])

Suppose also that we have an index array:

idx = np.array([0,4,3,1,3])

While we can extract elements from the matrix using the following:

mat[idx, range(len(idx))]
array([-1.3979852 , -1.7230821 , -0.2804641 ,  1.1252506 , -0.09113139])

What I want to know is how we can use the index to exclude elements from matrix, i.e. how do I obtain the following result:

array([[0.12491113 , -0.37711369, -1.99509723, -0.6151796 , -0.78780951],
       [-3.5933008 ,  0.90526669, -0.18217331, -0.59719805, -0.31782889],
       [0.6906646  , -0.17981343,  0.91469733,  1.39864598,  0.12728085],
       [-0.38012856,  0.2316733 , -0.5779237 ,  0.30610451, -1.30015299]])

Thought it would be as simple as doing mat[-idx, range(len(idx))] but that doesn't work. I've also tried np.delete() but that doesn't seem to do it either. Any solutions out there that don't require looping or list comprehensions? Would appreciate any insight. Thanks.

EDIT: data must be in the same columns post processing.

Upvotes: 1

Views: 1451

Answers (1)

hpaulj
hpaulj

Reputation: 231625

When you say 'delete' does not work, what do you mean? What does it do? That might be diagnostic.

Lets first look at the selection that does work:

In [484]: mat=np.arange(25).reshape(5,5) # I like this better than random

In [485]: mat[idx,range(5)]
Out[485]: array([ 0, 21, 17,  8, 19])

this can also be used on a flattened version of the file:

In [486]: mat.flat[idx*5+np.arange(5)]
Out[486]: array([ 0, 21, 17,  8, 19])

now try the same with the default flat delete:

In [487]: np.delete(mat,idx*5+np.arange(5)).reshape(5,4)
Out[487]: 
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  9],
       [10, 11, 12, 13],
       [14, 15, 16, 18],
       [20, 22, 23, 24]])

delete isn't an inplace operator; it returns a new matrix. And if you specify an axis, delete removes whole rows or columns, not selected items.

mat[-idx, range(len(idx))] isn't going to work since negative indexes already have a meaning - count from the end.

This delete ends up doing boolean indexing, thus:

In [498]: mat1=mat.ravel()
In [499]: idx1=idx*5+np.arange(5)
In [500]: ii=np.ones(mat1.shape, bool)
In [501]: ii[idx1]=False
In [502]: mat1[ii]
Out[502]: 
array([ 1,  3,  4,  5,  6,  7,  8,  9, 10, 12, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24])

This sort of indexing/delete works even if you delete a different number of items from each row. Of course in that case you couldn't count on reshaping the matrix back to a rectangular matrix.

In general when dealing with different indexes for different rows, the operation ends up acting on the flat or raveled version of the matrix. 'Irregular' operations usually make more sense when dealing with 1d arrays than with 2d.


Looking more carefully at your example, I see that when you remove an item, you move the other column values up to fill the gap. In my version, I moved values along rows. Let's try this with F ordered.

In [523]: mat2=mat.flatten('F')
In [524]: np.delete(mat2,idx2).reshape(5,4).T
Out[524]: 
array([[ 5,  1,  2,  3,  4],
       [10,  6,  7, 13,  9],
       [15, 11, 12, 18, 14],
       [20, 16, 22, 23, 24]])

where I removed a value from each column:

In [525]: mat2[idx2]
Out[525]: array([ 0, 21, 17,  8, 19])

Upvotes: 1

Related Questions