Reputation: 31
Pythonistas,
I'm trying to write a Python extension in C that passes a big amount of null terminated, UNICODE UTF-16 encoded C strings to my Python application. The UNICODE strings from my C library are guarenteed to be always 16 bit. I'm NOT using the wchar_t in my C library on LINUX due to the fact that the size of wchar_t may vary.
I found a lot of functions (PyUnicode_AsUTF8String, PyString_FromStringAndSize, PyString_FromString, etc.) that do exactly what i want but all theses functions are designed for 8 bit character/string representation.
The Python documentation (http://docs.python.org/howto/unicode.html) says:
"Under the hood, Python represents Unicode strings as either 16- or 32-bit integers, depending on how the Python interpreter was compiled."
I'm really keen to avoid the performance penalty of converting all my UTF-16 C strings to UTF-8 C strings only for Python interface purposes, especially on Windows if the Python interpreter uses 16 bit "under the hood" as well.
Any idea how to tackle this challenge is highly appreciated.
Thanks, Thomas
Upvotes: 3
Views: 803
Reputation: 133485
You can't avoid copying the data (unless you break through the Python C API) but you can create Python unicode objects directly from UTF-16 data, using PyUnicode_DecodeUTF16
; see http://docs.python.org/c-api/unicode.html#utf-16-codecs.
Upvotes: 2