Paul Grinberg
Paul Grinberg

Reputation: 1477

python3 C extension module with cp1252 encoded string

I am writing a Python3 extension module for an existing C++ library which returns a string that appears to be in cp1252 encoding. The C++ function signature is

int get_name(std::string& name);

where name is the output variable that contains a string with c_str() contents like 0xb04600, which is DegreeSymbol in cp1252 code page, followed by upper case F, completed by the NULL character.

In my python extension C++ code, I wrote

std::string name;
int retval = get_value(name);
py_retval = Py_BuildValue((char *) "is#", retval, (name).c_str(), (name).size());

However, this causes the following runtime exception

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 0: invalid start byte

What is the correct way for me to return a cp2152 encoded string into python?

UPDATE I figured out that if I use y# instead of s# to return a Python bytes object from the extension, then I can convert that bytes object back to a string in my python code with .decode('cp1252'). However, this is an extra step in Python that should be automated in the extension module. Unfortunately, I cannot figure out how

Upvotes: 3

Views: 324

Answers (1)

Davis Herring
Davis Herring

Reputation: 40053

PyUnicode_Decode can do this job for any standard encoding without even having to make a bytes object first. (You can pass it with code N to Py_BuildValue to avoid worrying with reference counts, although that trick doesn’t apply in all cases.)

Upvotes: 2

Related Questions