Dschoni
Dschoni

Reputation: 3872

Fast value swapping in numpy array

So, this is something, that should be pretty easy, but it seems to take an enormous amount of time for me: I have a numpy array with only two values (example 0 and 255) and I want to invert the matrix in that way, that all values swap (0 becomes 255 and vice versa). The matrices are about 2000³ entries big, so this is serious work! I first tried the numpy.invert method, which is not exactly what I expected. So I tried to do that myself by "storing" the values and then override them:

for i in range(array.length):
            array[i][array[i]==255]=1
            array[i][array[i]==0]=255
            array[i][array[i]==1]=0

which is behaving as expected, but taking a long time (I guess due to the for loop?). Would that be faster if I implement that as a multithreaded calculation, where every thread "inverts" a smaller sub-array? Or is there another way of doing that more conveniently?

Upvotes: 2

Views: 2200

Answers (4)

Eric O. Lebigot
Eric O. Lebigot

Reputation: 94605

You can simply do:

arr_inverted = 255-arr

This converts all the elements one by one (255 gives 0 and 0 gives 255). More generally, if you only have two values a and b, the "inversion" is simply done with (a+b)-arr. This also works if the two values are not integers (like floats or complex numbers).

As Jaime pointed out, if memory is a concern subtract(255, arr, out=arr) swaps the values of arr in-place.

If you more generally have integers in your array, Janne Karila's XOR in-place solution has the advantage of being more concise than the difference in-place solution suggested above. It can be generalized as arr ^= (a^b), for swapping two integers a and b.

The execution times are similar between both methods (with a 200×200×200 array of uint8 integers, through IPython):

>>> arr = np.random.choice((0, 255), (200, 200, 200)).astype('uint8')
>>> %timeit np.bitwise_xor(255, arr, out=arr)
100 loops, best of 3: 7.65 ms per loop
>>> %timeit np.subtract(255, arr, out=arr)
100 loops, best of 3: 7.69 ms per loop

If your array is of type uint8, arr_inverted = ~a takes the same time, for swapping 0 and 255 (the ~ operator inverts all the bits), and is less general, so it's not worth it (tested with a 200×200×200 array).

Upvotes: 4

Joe Kington
Joe Kington

Reputation: 284970

In addition to @JanneKarila's and @EOL's excellent suggestions, it's worthwhile to show a more efficient approach to using a mask to do the swap.

Using a boolean mask is more generally useful if you have a more complex comparison than simply swapping two values, but your example uses it in a sub-optimal way.

Currently, you're making multiple temporary copies of the boolean "mask" array (e.g. array[i] == blah) in your example above and performing multiple assignments. You can avoid this by just making the "mask" boolean array once and the inverting it.

If you have enough ram for a temporary copy (of bool dtype), try something like this:

mask = (data == 255)
data[mask] = 0
data[~mask] = 255

Alternately (and equivalently) you could use numpy.where:

data = numpy.where(data == 255, 0, 255)

If you were using a loop to avoid making a full temporary copy, and need to conserve ram, adjust your loop to be something more like this:

for i in range(len(array)):
     mask = (array[i] == 255)
     array[mask] = 0
     array[~mask] = 255

All that having been said, either subtraction or XOR is the way to go in this case, especially if you preform the operation in-place!

Upvotes: 8

Janne Karila
Janne Karila

Reputation: 25207

To swap 0 and 255, you can use XOR if the data type is one of the integer types.

array ^= 255

Upvotes: 4

that_guy
that_guy

Reputation: 19

"I first tried the numpy.invert method, which is not exactly what I expected."

Numpy.invert is exactly what you need. Can you describe what happened? Did you use an unsigned byte for storage rather than a signed datatype or an integer?

Unsigned byte + numpy.invert should do exactly what you want.

[You should also see faster performance in numpy with unsigned bytes rather than longer or signed datatypes]

Upvotes: 1

Related Questions