Alec Teal
Alec Teal

Reputation: 5928

How can I quickly set a Python bytearray to 0

When a Python bytearray is created (with an integer passed to it) it creates a bytearray of that many bytes, and sets them all to zero.

I want to clear the bytearray, and it could be quite large, iterating over it and setting the contents to zero that way is pretty poor.

Is there a better way?

(memoryviews and bytearrays are poorly documented IMO)

Best resources so far (but none of them answer my question)

http://docs.python.org/dev/library/stdtypes.html#bytes-methods

http://docs.python.org/dev/library/functions.html#bytearray

Upvotes: 2

Views: 5561

Answers (4)

Oliver
Oliver

Reputation: 29571

Here are a few different ways of clearing a bytearray without changing the reference (in case other object refer to it):

  1. Using clear():

    >>> a=bytearray(10)
    >>> a
    bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
    >>> a.clear()
    >>> a
    bytearray(b'')
    
  2. Using slicing:

    >>> a=bytearray(10)
    >>> a[0:10] = []
    >>> a
    bytearray(b'')
    >>> a=bytearray(10)
    >>> del a[0:10]
    >>> a
    bytearray(b'')
    
  3. Using del:

    >>> a=bytearray(10)
    >>> b=a
    >>> del a[0:10]
    >>> a
    bytearray(b'')
    

You can verify that if another variable, say b, references a, none of the above technique will break this. The following technique of resetting a, by creating a new bytearray, will break this:

>>> a=bytearray(10)
>>> b=a
>>> b is a
True
>>> a=bytearray(10)
>>> b is a
False

However, all the above change the array size to 0. Perhaps you want to simply 0 all the items, keeping the size unchanged, and keeping any references valid:

>>> a=bytearray(10)
>>> b=a
>>> b is a
True
>>> a[0:10]=bytearray(10)
>>> b is a
True

So you can easily, with this technique, 0 any subsection of the array (in fact, of any mutable container).

Upvotes: 2

unutbu
unutbu

Reputation: 880399

Edit: This answer is wrong. s = s.translate('\0'*256) is slower than s = bytearray(256), so there is no point in using translate here. @gnibbler provides a better solution.


Bytearrays have many of the same methods that strings have. You could use the translate method:

In [64]: s = bytearray('Hello World')

In [65]: s
Out[65]: bytearray(b'Hello World')

In [66]: import string

In [67]: zero = string.maketrans(buffer(bytearray(range(256))),buffer(bytearray(256)))

In [68]: s.translate(zero)
Out[68]: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

By the way, Dave Beazley has written a very useful introduction to bytearrays.


Or, slightly modifying millimoose's answer:

In [72]: s.translate('\0'*256)
Out[72]: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

In [73]: %timeit s.translate('\0'*256)
1000000 loops, best of 3: 282 ns per loop

In [74]: %timeit s.translate(bytearray(256))
1000000 loops, best of 3: 398 ns per loop

Upvotes: 2

John La Rooy
John La Rooy

Reputation: 304375

Why would you assume reallocating the bytearray is so slow? It's more than 10 times faster than using translate or large bytearrays!

I'm deleting the original bytearray so you don't have to worry about temporarily using double the memory

# For small bytearray reallocation is a tiny bit faster

$ python -m timeit -s "s=bytearray('Hello World')" "s.translate('\0'*256)"
1000000 loops, best of 3: 0.672 usec per loop
$ python -m timeit -s "s=bytearray('Hello World')" "lens=len(s);del s;s=bytearray(lens)"
1000000 loops, best of 3: 0.522 usec per loop


# For large bytearray reallocation is much faster

$ python -m timeit -s "s=bytearray('Hello World'*10000)" "s.translate('\0'*256)"
1000 loops, best of 3: 225 usec per loop
$ python -m timeit -s "s=bytearray('Hello World'*10000)" "lens=len(s);del s;s=bytearray(lens)"
10000 loops, best of 3: 18.5 usec per loop

There's an even better way that allow s to keep the same reference. You simply need to call the __init__ method on the instance.

>>> s=bytearray(b"hello world")
>>> id(s)
3074325152L
>>> s.__init__(len(s))
>>> s
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
>>> id(s)
3074325152L

Testing the timing

$ python -m timeit -s "s=bytearray('Hello World'*10000)" "s.__init__(len(s))"
100000 loops, best of 3: 18.7 usec per loop

I ran these gigabyte tests on a different computer with more RAM

$ python -m timeit -s "s=bytearray('HelloWorld'*100000000)" "s.__init__(len(s))"
10 loops, best of 3: 454 msec per loop
$ python -m timeit -s "s=bytearray('HelloWorld'*100000000)" "s.translate('\0'*256)"
10 loops, best of 3: 1.43 sec per loop

Upvotes: 6

jramirez
jramirez

Reputation: 8685

All you need to do is re declare your bytearray

b = bytearray(LEN_OF_BYTE_ARRAY)

Upvotes: 0

Related Questions