Reputation: 67

Deleting values from multiple arrays that have a particular value

Lets say I have two arrays: a = array([1,2,3,0,4,5,0]) and b = array([1,2,3,4,0,5,6]). I am interested in removing the instances where a and bare 0. But I also want to remove the corresponding instances from both lists. Therefore what I want to end up with is a = array([1,2,3,5]) and b = array([1,2,3,5]). This is because a[3] == 0 and a[6] == 0, so both b[3] and b[6] are also deleted. Likewise, since b[4] == 0, a[4] is also deleted.Its simple to do this for say two arrays:

import numpy as np
a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])

ix = np.where(b == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)

ix = np.where(a == 0)
b = np.delete(b, ix)
a = np.delete(a, ix)

However this solution doesnt scale up if I have many many arrays (which I do). What would be a more elegant way to do this?

If I try the following:

import numpy as np

a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])

arrays = [a,b]

for array in arrays:
    ix = np.where(array == 0)
    b = np.delete(b, ix)
    a = np.delete(a, ix)

I get a = array([1, 2, 3, 4]) and b = array([1, 2, 3, 0]), not the answers I need. Any idea where this is wrong?

Upvotes: 0

Answers (4)

Jan Christoph Terasa

Reputation: 5935

Assuming both/all arrays always have the same length, you can use masks:

ma = a != 0 # mask elements which are not equal to zero in a
mb = b != 0 # mask elements which are not equal to zero in b
m = ma * mb # assign the intersection of ma and mb to m
print a[m], b[m] # [1 2 3 5] [1 2 3 5]

You can of course also do it in one line

m = (a != 0) * (b != 0)

Or use the inverse

ma = a == 0
mb = b == 0
m = ~(ma + mb) # not the union of ma and mb

Upvotes: 3

dnalow

Reputation: 984

Building up on top of Christoph Terasa's answer, you can use array operations instead of for loops:

arrays = np.vstack([a,b]) # ...long list of arrays of equal length

zeroind = (arrays==0).max(0)

pos_arrays = arrays[:,~zeroind] # a 2d array only containing those columns where none of the lines contained zeros

Upvotes: 0

Noah Bogart

Reputation: 1767

A slow method involves operating over the whole list twice, first to build an intermediate list of indices to delete, and then second to delete all of the values at those indices:

import numpy as np

a = np.array([1,2,3,0,4,5,0])
b = np.array([1,2,3,4,0,5,6])

arrays = [a, b]
vals = []

for array in arrays:
    ix = np.where(array == 0)
    vals.extend([y for x in ix for y in x.tolist()])

vals = list(set(vals))

new_array = []
for array in arrays:
    new_array.append(np.delete(array, vals))

Upvotes: 0

Zeokav

Reputation: 1703

This is happening because when you return from np.delete, you get an array that is stored in b and a inside the loop. However, the arrays stored in the arrays variable are copies, not references. Hence, when you're updating the arrays by deleting them, it deletes with regard to the original arrays. The first loop will return the corrects indices of 0 in the array but the second loop will return ix as 4 (look at the original array).
Like if you display the arrays variable in each iteration, it is going to remain the same.

You need to reassign arrays once you are done processing one array so that it's taken into consideration the next iteration. Here's how you'd do it -

a = np.array([1, 2, 3, 0, 4, 5, 0])
b = np.array([1, 2, 3, 4, 0, 5, 6])
arrays = [a,b]
for i in range(0, len(arrays)):
  ix = np.where(arrays[i] == 0)
  b = np.delete(b, ix)
  a = np.delete(a, ix)
  arrays = [a, b]

Of course you can automate what happens inside the loop. I just wanted to give an explanation of what was happening.

Upvotes: 1

Deleting values from multiple arrays that have a particular value

Answers (4)

Related Questions