Reputation: 297
This is a noob question on Python.
Is there a way in Python to truncate off few bytes from the begining of bytearray and achieve this without copying the content to another memory location? Following is what I am doing:
inbuffer = bytearray()
inbuffer.extend(someincomingbytedata)
x = inbuffer[0:10]
del inbuffer[0:10]
I need to retain the truncated bytes (referenced by x) and perform some operation on it.
will x point to the same memory location as inbuffer[0] or will the 3rd line in the above code make a copy of data. Also, if the copy is not made, will deleting in the last line also delete the data referenced by x? Since x is still referencing that data, GC should not be reclaiming it. Is that right?
Edit:
If this is not the right way to truncate a byte buffer and return the truncated bytes without copying, is there any other type that supports such operation safely?
Upvotes: 1
Views: 2286
Reputation: 2459
In your example, x
will be a new object that holds a copy of the contents of inbuffer[0:10]
.
To get a representation without copying, you need to use a memoryview (available only in Python 3):
inbuffer_view = memoryview(inbuffer)
prefix = inbuffer_view[0:10]
suffix = inbuffer_view[10:]
Now prefix
will point to the first 10 bytes of inbuffer
, and suffix
will point to the remaining contents of inbuffer
. Both objects keep an internal reference to inbuffer
, so you do not need to explicitly keep references to inbuffer
or inbuffer_view
.
Note that both prefix
and suffix
will be memoryviews, not bytearrays or bytes. You can create bytes and bytearrays from them, but at that point the contents will be copied.
memoryviews can be passed to any function that works with objects that implement the buffer protocol. So, for example, you can write them directly into a file using fh.write(suffix).
Upvotes: 1
Reputation: 56517
It is very easy to check:
>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> x = inbuffer[0:2]
>>> print id(x) == id(inbuffer)
False
So it is not the same object.
Also you are asking about x
pointing at inbuffer[0]
. You seem to misunderstand something. Arrays in Python don't work the same way as arrays in C. The address of inbuffer
is not the address of inbuffer[0]
:
>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> print id(inbuffer) == id(inbuffer[0])
False
These are wrappers around C-level arrays.
Also in Python everything is an object. And Python caches all integers up to 256 (the range of bytearray
). Therefore the only thing that is copied over is pointers:
>>> inbuffer = bytearray([1, 2, 3, 4, 5])
>>> print id(inbuffer[0]) == id(1)
True
Upvotes: 0
Reputation: 104762
You can use the iterator protocol and itertools.islice
to pull the first 10 values out of your someincomingbytedata
iterable before putting the rest into inbuffer
. This doesn't use the same memory for all the bytes, but it's about as good as you can get at avoiding unnecessary copying with a bytearray
:
import itertools
it = iter(someincomingbytedata)
x = bytearray(itertools.islice(it, 10)) # consume the first 10 bytes
inbuffer = bytearray(it) # consume the rest
If you really do need to do your reading all up front and then efficiently view various slices of it without copying, you might consider using numpy
. If you load your data into a numpy array, any slices you take later will be views into the same memory:
import numpy as np
inbuffer = np.array(someincomingdata, dtype=np.uint8) # load data into an array of bytes
x = inbuffer[:10] # grab a view of the first ten bytes, which does not require a copy
inbuffer = inbuffer[10:] # change inbuffer to reference a slice; no copying here either
Upvotes: 0