Kamil Sindi
Kamil Sindi

Reputation: 22832

Delete element from multi-dimensional numpy array by value

Given a numpy array

a = np.array([[0, -1, 0], [1, 0, 0], [1, 0, -1]])

what's the fastest way to delete all elements of value -1 to get an array of the form

np.array([[0, 0], [1, 0, 0], [1, 0]])

Upvotes: 4

Views: 4157

Answers (5)

Daniel F
Daniel F

Reputation: 14399

For almost everything you might want to do with such an array, you can use a masked array

a = np.array([[0, -1, 0], [1, 0, 0], [1, 0, -1]])

b=np.ma.masked_equal(a,-1)

b
Out[5]: 
masked_array(data =
 [[0 -- 0]
 [1 0 0]
 [1 0 --]],
             mask =
 [[False  True False]
 [False False False]
 [False False  True]],
       fill_value = -1)

If you really want the ragged array, it can be .compressed() by line

c=np.array([b[i].compressed() for i in range(b.shape[0])])

c
Out[10]: array([array([0, 0]), array([1, 0, 0]), array([1, 0])], dtype=object)

Upvotes: 1

Mark Hannel
Mark Hannel

Reputation: 795

Another method you might consider:

def iterative_numpy(a):
    mask = a != 1
    out = np.array([ a[i,mask[i]] for i xrange(a.shape[0]) ])
    return out

Divakar's method loop_compr_based calculates sums along the rows of mask and a cumulative sum of that result. This method avoids such summations but still has to iterate through the rows of a. It also returns an array of arrays. This has the annoyance that out has to be indexed with the syntax out[1][2] rather than out[1,2]. Comparing the times with a matrix random integer matrices:

In [4]: a = np.random.random_integers(-1,1, size = (3,30))

In [5]: %timeit iterative_numpy(a)
100000 loops, best of 3: 11.1 us per loop

In [6]: %timeit loop_compr_based(a)
10000 loops, best of 3: 20.2 us per loop

In [7]: a = np.random.random_integers(-1,1, size = (30,3))

In [8]: %timeit iterative_numpy(a)
10000 loops, best of 3: 59.5 us per loop

In [9]: %timeit loop_compr_based(a)
10000 loops, best of 3: 30.8 us per loop

In [10]: a = np.random.random_integers(-1,1, size = (30,30))

In [11]: %timeit iterative_numpy(a)
10000 loops, best of 3: 64.6 us per loop

In [12]: %timeit loop_compr_based(a)
10000 loops, best of 3: 36 us per loop

When there are more columns than rows, iterative_numpy wins out. When there are more rows than columns, loop_compr_based wins but transposing a first will improve the performance of both methods. When the dimensions are comparably the same, loop_compr_based is best.

Important Side Discussion

Outside of the implementation, it's important to note that any numpy array which has a non-uniform shape is not an actual array in the sense that the values do not occupy a contiguous section of memory and further, the usual array operations will not work as expected.

As an example:

>>> a = np.array([[1,2,3],[1,2],[1]])
>>> a*2
array([[1, 2, 3, 1, 2, 3], [1, 2, 1, 2], [1, 1]], dtype=object)

Notice that numpy actually informs us that this is not the usual numpy array with the note dtype=object.

Thus it might be best to just make a list of numpy arrays and use them accordingly.

Upvotes: 4

Divakar
Divakar

Reputation: 221534

Approach #1 : Using NumPy splitting of array -

def split_based(a, val):
    mask = a!=val
    p = np.split(a[mask],mask.sum(1)[:-1].cumsum())
    out = np.array(list(map(list,p)))
    return out

Approach #2 : Using loop comprehension, but minimal work within the loop -

def loop_compr_based(a, val):
    mask = a!=val
    stop = mask.sum(1).cumsum()
    start = np.append(0,stop[:-1])
    am = a[mask].tolist()
    out = np.array([am[start[i]:stop[i]] for i  in range(len(start))])
    return out

Sample run -

In [391]: a
Out[391]: 
array([[ 0, -1,  0],
       [ 1,  0,  0],
       [ 1,  0, -1],
       [-1, -1,  8],
       [ 3,  7,  2]])

In [392]: split_based(a, val=-1)
Out[392]: array([[0, 0], [1, 0, 0], [1, 0], [8], [3, 7, 2]], dtype=object)

In [393]: loop_compr_based(a, val=-1)
Out[393]: array([[0, 0], [1, 0, 0], [1, 0], [8], [3, 7, 2]], dtype=object)

Runtime test -

In [387]: a = np.random.randint(-2,10,(1000,1000))

In [388]: %timeit split_based(a, val=-1)
10 loops, best of 3: 161 ms per loop

In [389]: %timeit loop_compr_based(a, val=-1)
10 loops, best of 3: 29 ms per loop

Upvotes: 2

Tanaike
Tanaike

Reputation: 201378

How about this?

print([[y for y in x if y > -1] for x in a])
[[0, 0], [1, 0, 0], [1, 0]]

Upvotes: 1

VanDarkholme
VanDarkholme

Reputation: 75

Use indexes = np.where(a == -1) to get indexes of elements Find indices of elements equal to zero from numpy array

Then delete specific elements by index with np.delete(your_array, indexes) How to remove specific elements in a numpy array

Upvotes: 0

Related Questions