Vivek Madani
Vivek Madani

Reputation: 297

Python: Zero Copy while truncating a byte buffer

This is a noob question on Python.

Is there a way in Python to truncate off few bytes from the begining of bytearray and achieve this without copying the content to another memory location? Following is what I am doing:

inbuffer = bytearray()
inbuffer.extend(someincomingbytedata)
x = inbuffer[0:10]
del inbuffer[0:10]

I need to retain the truncated bytes (referenced by x) and perform some operation on it.

will x point to the same memory location as inbuffer[0] or will the 3rd line in the above code make a copy of data. Also, if the copy is not made, will deleting in the last line also delete the data referenced by x? Since x is still referencing that data, GC should not be reclaiming it. Is that right?

Edit:

If this is not the right way to truncate a byte buffer and return the truncated bytes without copying, is there any other type that supports such operation safely?

Upvotes: 1

Views: 2286

Answers (3)

Nikratio
Nikratio

Reputation: 2459

In your example, x will be a new object that holds a copy of the contents of inbuffer[0:10].

To get a representation without copying, you need to use a memoryview (available only in Python 3):

inbuffer_view = memoryview(inbuffer)
prefix = inbuffer_view[0:10]
suffix = inbuffer_view[10:]

Now prefix will point to the first 10 bytes of inbuffer, and suffix will point to the remaining contents of inbuffer. Both objects keep an internal reference to inbuffer, so you do not need to explicitly keep references to inbuffer or inbuffer_view.

Note that both prefix and suffix will be memoryviews, not bytearrays or bytes. You can create bytes and bytearrays from them, but at that point the contents will be copied.

memoryviews can be passed to any function that works with objects that implement the buffer protocol. So, for example, you can write them directly into a file using fh.write(suffix).

Upvotes: 1

freakish
freakish

Reputation: 56517

It is very easy to check:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> x = inbuffer[0:2]
>>> print id(x) == id(inbuffer)
False

So it is not the same object.

Also you are asking about x pointing at inbuffer[0]. You seem to misunderstand something. Arrays in Python don't work the same way as arrays in C. The address of inbuffer is not the address of inbuffer[0]:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> print id(inbuffer) == id(inbuffer[0])
False

These are wrappers around C-level arrays.

Also in Python everything is an object. And Python caches all integers up to 256 (the range of bytearray). Therefore the only thing that is copied over is pointers:

>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> print id(inbuffer[0]) == id(1)
True

Upvotes: 0

Blckknght
Blckknght

Reputation: 104762

You can use the iterator protocol and itertools.islice to pull the first 10 values out of your someincomingbytedata iterable before putting the rest into inbuffer. This doesn't use the same memory for all the bytes, but it's about as good as you can get at avoiding unnecessary copying with a bytearray:

import itertools

it = iter(someincomingbytedata)
x = bytearray(itertools.islice(it, 10)) # consume the first 10 bytes
inbuffer = bytearray(it)                # consume the rest

If you really do need to do your reading all up front and then efficiently view various slices of it without copying, you might consider using numpy. If you load your data into a numpy array, any slices you take later will be views into the same memory:

import numpy as np

inbuffer = np.array(someincomingdata, dtype=np.uint8)  # load data into an array of bytes
x = inbuffer[:10]  # grab a view of the first ten bytes, which does not require a copy
inbuffer = inbuffer[10:]  # change inbuffer to reference a slice; no copying here either

Upvotes: 0

Related Questions