Valus_Paulus
Valus_Paulus

Reputation: 15

2D numpy array: Remove lines that contain only empty tuples

I have a 2d numpy array that contains tuple with two elements: an int and an str.

An example on how the 2d array may look:

matrix = np.array(
[[(1, 'foo'), (), (4, 'bar')],
 [(),(),()],
 [(1, 'foo'), (), (3, 'foobar')],
 [(),(),()]], 
dtype=object)

I'm looking to remove the lines that contains only empty tuples.

I tried the following code:

matrix = matrix[~np.all(matrix == (), axis=1)]

but it gave me the following error:

numpy.AxisError: axis 1 is out of bounds for array of dimension 0

The above code works for a 2d array that contains only integers with a condition like that in the all function: matrix == 0. It correctly removes all lines that contains only zeros. So is there a way to do that but instead of removing lines with only zeros, to remove lines with only empty tuples?

Upvotes: 0

Views: 73

Answers (3)

Christoph Rackwitz
Christoph Rackwitz

Reputation: 15365

As suggested in the comments, do not use numpy here. Numpy is for numbers. You don't have numbers. Numpy arrays may be able to hold object but there's no benefit here, and you run into problems as you've seen.

You can just use a "list comprehension" and the all() function to filter your data.

lines = [
 [(1, 'foo'), (), (4, 'bar')],
 [(),(),()],
 [(1, 'foo'), (), (3, 'foobar')],
 [(),(),()]]

lines = [ line for line in lines if not all(elem == () for elem in line) ]

Upvotes: 0

Alexandre Novius
Alexandre Novius

Reputation: 182

The problem here is that tuples are Sequence Types. When you try to apply matrix == (), Numpy makes a comparison of matrices, and so matrix == () return a simple false.

This explains the error axis 1 is out of bounds for array of dimension 0, since false is of dimension 0.

A workaround is to test differently if a tuple is empty, for example by vectorizing the len function:

>>> vect_len = np.vectorize(len)

Then, we can do:

>>> matrix = matrix[~np.all(vect_len(matrix) == 0, axis=1)]
[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]

Or even more simple:

>>> matrix = matrix[np.any(vect_len(matrix), axis=1)]
[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]

Upvotes: 2

Cardstdani
Cardstdani

Reputation: 5223

You can try to traverse the array with a for loop and check if a sublist is made only with empty tuples with all() function:

import numpy as np

matrix = np.array([[(1, 'foo'), (), (4, 'bar')], [(), (), ()], [(1, 'foo'), (), (3, 'foobar')], [(), (), ()]])

for i in range(len(matrix)):
    try:
        if all(x == () for x in matrix[i]):
            matrix = np.delete(matrix, i, axis=0)
    except:
        pass
print(matrix)

Output:

[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]

Upvotes: 0

Related Questions