Mr. Liu
Mr. Liu

Reputation: 347

how to delete columns with same values in numpy

How is it possible to delete all the columns that have the same values in a NumPy array?

For example if I have this matrix:

[0 1 2 3 1]  
[0 2 2 1 0]  
[0 4 2 3 4]  
[0 1 2 3 4]  
[0 1 2 4 5]

I want to get a new matrix that looks like this:

[1 3 1]  
[2 1 0]  
[4 3 4]  
[1 3 4]  
[1 4 5]

Upvotes: 2

Views: 2678

Answers (2)

Tasos Papastylianou
Tasos Papastylianou

Reputation: 22245

Assuming

import numpy
a = numpy.array([[0, 1, 2, 3, 1],
                 [0, 2, 2, 1, 0],
                 [0, 4, 2, 3, 4],
                 [0, 1, 2, 3, 4],
                 [0, 1, 2, 4, 5]])

then

b = a == a[0,:]   # compares first row with all others using broadcasting
# b: array([[ True,  True,  True,  True,  True],
#           [ True, False,  True, False, False],
#           [ True, False,  True,  True, False],
#           [ True,  True,  True,  True, False],
#           [ True,  True,  True, False, False]], dtype=bool)

using all along the rows acts as a row-wise and operation (thanks Divakar!):

c = b.all(axis=0)
# c: array([ True, False,  True, False, False], dtype=bool)

which you can use for boolean indexing

a[:, ~c]
Out: 
array([[1, 3, 1],
       [2, 1, 0],
       [4, 3, 4],
       [1, 3, 4],
       [1, 4, 5]])

As an ugly oneliner:

a[:, ~(a == a[0,:]).all(0)]

Upvotes: 2

akuiper
akuiper

Reputation: 215057

You can compare the array with the shifted version of itself, if all pairs are equal for a column, then the column contains only one unique value, which can be removed with boolean indexing:

a[:, ~np.all(a[1:] == a[:-1], axis=0)]

#array([[1, 3, 1],
#       [2, 1, 0],
#       [4, 3, 4],
#       [1, 3, 4],
#       [1, 4, 5]])

Upvotes: 5

Related Questions