Reputation: 136615
According to the BytesIO docs:
getbuffer()
Return a readable and writable view over the contents of the buffer without copying them. Also, mutating the view will transparently update the contents of the buffer:
getvalue()
Return bytes containing the entire contents of the buffer.
So it seems as if getbuffer
is more complicated. But if you don't need a writable view? Would you then simply use getvalue
? What are the trade-offs?
In this example, it seems as if they do exactly the same:
# Create an example
from io import BytesIO
bytesio_object = BytesIO(b"Hello World!")
# Write the stuff
with open("output.txt", "wb") as f:
f.write(bytesio_object.getbuffer())
Upvotes: 21
Views: 6644
Reputation: 2706
This question is old, but it looks like nobody has answered this sufficiently.
Simply:
obj.getbuffer()
creates a memoryview
object.memoryview
of obj
present, obj.getvalue()
will need to create a new, complete value.obj.getvalue()
call) and there is no memoryview
present, obj.getvalue()
is the fastest method of access, and requires no copies.That being the case:
io.BytesIO
, use obj.getvalue()
obj.getbuffer()
obj.getbuffer()
, unless your file is tiny.obj.getvalue()
while a buffer is laying around.Here, we see that it's all fast, and all well and good if no buffer is laying around:
# time getvalue()
>>> i = io.BytesIO(b'f' * 1_000_000)
>>> %timeit i.getvalue()
34.6 ns ± 0.178 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# time getbuffer()
>>> %timeit i.getbuffer()
118 ns ± 0.495 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# time getbuffer() and getvalue() together
>>> %timeit i.getbuffer(); i.getvalue()
173 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Everything is fine, and working about like you'd expect. But let's see what happens when there's a buffer just laying around:
>>> x = i.getbuffer()
>>> %timeit i.getvalue()
33 µs ± 675 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Notice that we're no longer measuring in nanoseconds, we're measuring in microseconds. That's multiple orders of magnitude slower. If you del x
, we're back to being fast. This is all because while a memoryview
exists, Python has to account for the possibility that the BytesIO
may have been written to. So, to give a definite state to the user, it copies the buffer.
Upvotes: 5