Reputation: 61
I ran into what I thought was going to be a very simple problem (and I hope it is!), which is to take raw data out of memory, and decode it to a Unicode string.
Doing this is the obvious approach, and works:
the_string = mv.tobytes().decode("utf-8")
where mv is the memoryview in question. But that defeats the purpose of zero copy, because a copy is generated by the tobytes() method. So the next thing to try was to "cast" the memoryview to a bytearray. In other words, create a bytearray that uses the memory view "mv" as its backing data. I thought that this would be simple, but I cannot figure out how to do this. Does anyone out there know how?
Upvotes: 6
Views: 1123
Reputation: 17
The answer is codecs.decode
in stdlib.
For example:
>>> b = "Hello 你好".encode("utf-8")
>>> b
b'Hello \xe4\xbd\xa0\xe5\xa5\xbd'
>>> m = memoryview(b)
>>> m.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'memoryview' object has no attribute 'decode'
>>> import codecs
>>> codecs.decode(m, "utf-8")
'Hello 你好'
>>> codecs.decode(m[:-3], "utf-8")
'Hello 你'
Upvotes: -1