Using c-like arrays in python

Question

Is the following ever done in python to minimize the "allocation time" of creating new objects in a for loop in python? Or, is this considered bad practice / there is a better alternative?

for row in rows:
    data_saved_for_row = [] // re-initializes every time (takes a while)
    for item in row:
        do_something()
    do_something

vs. the "c-version" --

data_saved_for_row = []
for row in rows:
    for index, item in enumerate(row):
        do_something()
    data_saved_for_row[index + 1] = '\0' # now we have a crude way of knowing
    do_something_with_row()              # when it ends without having 
                                         # to always reinitialize

Normally the second approach seems like a terrible idea, but I've run into situations when iterating million+ items where the initialization time of the row:

data_saved_for_row = []

has taken a second or more to do.

Here's an example:

>>> print timeit.timeit(stmt="l = list();", number=int(1e8))
7.77035903931

Green Cloak Guy · Accepted Answer

If you want functionality for this sort of performance, you may as well just write it in C yourself and import it with ctypes or something. But then, if you're writing this kind of performance-driven application, why are you using Python to do it in the first place?

You can use list.clear() as a middle-ground here, not having to reallocate anything immediately:

data_saved_for_row = []
for row in rows:
    data_saved_for_row.clear()
    for item in row:
        do_something()
    do_something

but this isn't a perfect solution, as shown by the cPython source for this (comments omitted):

static int
_list_clear(PyListObject *a)
{
    Py_ssize_t i;
    PyObject **item = a->ob_item;
    if (item != NULL) {
        i = Py_SIZE(a);
        Py_SIZE(a) = 0;
        a->ob_item = NULL;
        a->allocated = 0;
        while (--i >= 0) {
            Py_XDECREF(item[i]);
        }
        PyMem_FREE(item);
    }

    return 0;
}

I'm not perfectly fluent in C, but this code looks like it's freeing the memory stored by the list, so that memory will have to be reallocated every time you add something to that list anyway. This strongly implies that the python language just doesn't natively support your approach.

Or you could write your own python data structure (as a subclass of list, maybe) that implements this paradigm (never actually clearing its own list, but maintaining a continuous notion of its own length), which might be a cleaner solution to your use case than implementing it in C.

Using c-like arrays in python

Answers (1)

Related Questions