PyUnicode_FromFormat with (not-unicode) strings

Question

I try to create a representation function for a class and I want it to be python-2.x and python-3.x compatible. However I noticed that normal strings when passed to PyUnicode_FromFormat as %U will segfault. The only viable workaround that I found was to convert it to a unicode object myself with PyUnicode_FromObject and then pass the result to the PyUnicode_FromFormat:

/* key and value are arguments for the function. */
PyObject *repr;
if (PyUnicode_CheckExact(key)) {
    repr = PyUnicode_FromFormat("%U=%R", key, value);
} 
else {
    PyObject *tmp = PyUnicode_FromObject(key);
    if (tmp == NULL) {
        return NULL;
    }
    repr = PyUnicode_FromFormat("%U=%R", tmp, value);
    Py_DECREF(tmp);
}

The point is that I want the representation to be without the "" (or '') that would be added if I use %R or %S.

I only recently found the issue and I'm using PyUnicode_FromFormat("%U", something); all over the place so the question I have is: Can this be simplified while keeping it Python 2.x and 3.x compatible?

DavidW · Accepted Answer

I don't think a very simplified way of doing what you want exists. The best I can see is to eliminate the if statement by just using your else case and thus always calling PyUnicode_FromObject:

PyObject *tmp = PyUnicode_FromObject(key);
if (tmp == NULL) {
    return NULL;
}
repr = PyUnicode_FromFormat("%U=%R", tmp, value);
Py_DECREF(tmp);

If you look at the implementation of PyUnicode_FromObject you'll see the first thing it does is PyUnicode_CheckExact and in that case it returns an increfed version of the original object. Therefore the extra work done is pretty small (for the case where key is already unicode) and it should be slightly more efficient for the case where key isn't unicode since you avoid a branch.

PyUnicode_FromFormat with (not-unicode) strings

Answers (1)

Related Questions