Reputation: 1477
I am writing a Python3 extension module for an existing C++ library which returns a string that appears to be in cp1252 encoding. The C++ function signature is
int get_name(std::string& name);
where name
is the output variable that contains a string with c_str() contents like 0xb04600, which is DegreeSymbol in cp1252 code page, followed by upper case F
, completed by the NULL character.
In my python extension C++ code, I wrote
std::string name;
int retval = get_value(name);
py_retval = Py_BuildValue((char *) "is#", retval, (name).c_str(), (name).size());
However, this causes the following runtime exception
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 0: invalid start byte
What is the correct way for me to return a cp2152 encoded string into python?
UPDATE
I figured out that if I use y#
instead of s#
to return a Python bytes object from the extension, then I can convert that bytes object back to a string in my python code with .decode('cp1252')
. However, this is an extra step in Python that should be automated in the extension module. Unfortunately, I cannot figure out how
Upvotes: 3
Views: 324
Reputation: 40053
PyUnicode_Decode
can do this job for any standard encoding without even having to make a bytes
object first. (You can pass it with code N
to Py_BuildValue
to avoid worrying with reference counts, although that trick doesn’t apply in all cases.)
Upvotes: 2